Re: Some very nice broken IPv6 networks at Google and Akamai

2014-11-11 Thread Jeroen Massar
On 2014-11-11 16:00, Emanuel Popa wrote:
 Hi,
 
 Is there anyway to intentionally and immediately get on Google's DNS
 blacklist in order to avoid similar outages in the future affecting
 only IPv6 traffic?
 http://www.google.com/intl/en_ALL/ipv6/statistics/data/no_.txt
 
 Or maybe the smart thing to do is building another ISP controllable
 blacklist of broken domains and tell BIND on the caches to return only
 A records for blacklisted domains. Or the other way around: only 
 records for IPv4 broken/blacklisted domains...

As most modern clients do Happy Eyeballs, you could just null route the
destination prefixes and see all clients fall back to IPv6.

But it is rather evil to do that especially at an ISP level. Could have
done that for SixXS and give people working stuff that way, but that
would not have actually resolved the problem, just hidden it.

If you expect that they have outages that they cannot quickly see or
not, then you should also expect a blacklist like to be broken or not
properly update. Hence, better to see the problems and to alert the
folks so that they can fix these issues properly (though Google is now
just hacking around with MSS clamping...).


They typically do not have these issues, they just did not notice it
this time around and thus it took a while for them to wake up (timezones
:) figure out what it is and fix the issue.

I am fairly confident though that Google is now monitoring their stuff
correctly. Lots of good folks there, stuff breaks, they fix it.

Greets,
 Jeroen



Re: Some very nice broken IPv6 networks at Google and Akamai

2014-11-11 Thread Nick Hilliard
On 11/11/2014 15:00, Emanuel Popa wrote:
 Is there anyway to intentionally and immediately get on Google's DNS
 blacklist in order to avoid similar outages in the future affecting
 only IPv6 traffic?
 http://www.google.com/intl/en_ALL/ipv6/statistics/data/no_.txt
 
 Or maybe the smart thing to do is building another ISP controllable
 blacklist of broken domains and tell BIND on the caches to return only
 A records for blacklisted domains. Or the other way around: only 
 records for IPv4 broken/blacklisted domains...

... or alternatively, depend on Google, Akamai and others not breaking.
This is what we do for ipv4 and it normally works well, but not always.

Bear it in mind that every time a hack is installed to work around a
potential future problem, that hack needs maintenance and if it breaks,
there's a chance that the resulting damage will be at least as bad as what
it was seeking to avoid in the first place.  Unless there are persistent
reliability problems, hacks tend not to be worth it.

Nick



Re: Some very nice broken IPv6 networks at Google and Akamai

2014-11-11 Thread Jeroen Massar
On 2014-11-11 19:09, Andras Toth wrote:
 On Tue, Nov 11, 2014 at 3:36 PM, Jeroen Massar jer...@massar.ch
 mailto:jer...@massar.ch wrote:
 
 If you expect that they have outages that they cannot quickly see or
 not, then you should also expect a blacklist like to be broken or not
 properly update. Hence, better to see the problems and to alert the
 folks so that they can fix these issues properly (though Google is now
 just hacking around with MSS clamping...).
 
 
 [de-cloak]
 
 Google has been doing MSS clamping for a long time, I've seen this
 myself in packet captures and Lorenzo also confirmed in his email:
 ...some Google servers temporarily stopped doing MSS clamping.
 
 They do it for a good reason: to prevent PMTUD as it introduces delay
 and their customers (eyeballs) wouldn't like it. Lorenzo and others
 explained this too several times.

That explanation was seen, at least for me, the first time in this thread.

As stated, the MSS clamping is just hiding the real problems. It does
not properly resolve anything.

 The world is not spinning around sixxs and your design ideas. Please
 turn off write-only.

Wow, here come the ad-hominem attacks, stay in lurk mode if you can't
handle people raising issues. If I had not commented about this problem,
it would never have come to light... maybe in several years when nothing
could have been done anymore. But today, we still can fix things.

Please realize that the world does have a lot more users than SixXS.

Noting problems and properly fixing them are important.

Greets,
 Jeroen



Re: Some very nice broken IPv6 networks at Google and Akamai

2014-11-10 Thread Sander Steffann
Hi Lorenzo,

Op 9 nov. 2014, om 22:10 heeft Lorenzo Colitti lore...@google.com het 
volgende geschreven:
 On Sat, Nov 8, 2014 at 11:48 PM, Jeroen Massar jer...@massar.ch wrote:
 The issue with IPv6 access to Google should now be resolved.  Please let
 us know if you're still having problems.
 
 The fun question of course is: what/why/how the heck happened?
 
 Another fun question is why folks are relying on PMTUD instead of adjusting 
 their MTU settings (e.g., via RAs). But relying on PMTUD still imposes a 
 1-RTT penalty on every TCP connection to an IPv6 address you haven't talked 
 to in the last few minutes. Why would you do that to your connection?

I guess most users wouldn't really notice a 1-RTT delay. But I agree that it is 
less than optimal that every outbound connection from a network behind a 
non-1500-MTU link has to suffer this penalty. Unfortunately the current choices 
seem to be to either limit the link MTU (and making traffic to e.g. the local 
NAS suffer as well) or suffer the 1-RTT penalty.

 As to what happened: what happened here is that some Google servers 
 temporarily stopped doing MSS clamping. That was an outage, and AIUI it has 
 since been fixed. (Some parts of) Google infrastructure do not do PMTUD for 
 the latency reasons above and for reasons similar to those listed in 
 https://tools.ietf.org/html/draft-v6ops-jaeggli-pmtud-ecmp-problem-00 .

Thank you for the information. Great to have real data instead of guesses and 
speculation :)

Cheers!
Sander



Re: Some very nice broken IPv6 networks at Google and Akamai

2014-11-10 Thread Sander Steffann
Hi Philipp,

 Op 10 nov. 2014 om 21:09 heeft Philipp Kern pk...@debian.org het volgende 
 geschreven:
 
 On Mon, Nov 10, 2014 at 07:36:22PM +0100, Sander Steffann wrote:
 I guess most users wouldn't really notice a 1-RTT delay.
 
 Depends on the RTT. In mobile networks it generally sucks.

Good point :)
Sander



Re: Some very nice broken IPv6 networks at Google and Akamai

2014-11-09 Thread Daniel Austin

Hi,

I've been having terrible connectivity to Google via IPv6 the last few 
days (i'd even resorted to using Bing!), but can confirm it is working 
fine for me today.



Thanks,

Dan.


On 09/11/2014 06:26, Joe Hamelin wrote:

Google did have some issues, look at the outage list.  They are resolved
now:


  Damian Menscher dam...@google.com mailto:dam...@google.com


6:44 PM (3 hours ago)


The issue with IPv6 access to Google should now be resolved.  Please let
us know if you're still having problems.

--
Joe Hamelin, W7COM, Tulalip, WA, 360-474-7474

On Sat, Nov 8, 2014 at 7:09 PM, Brian E Carpenter
brian.e.carpen...@gmail.com mailto:brian.e.carpen...@gmail.com wrote:

On 09/11/2014 09:19, sth...@nethelp.no mailto:sth...@nethelp.no wrote:
 I'm not a native speaker of English, but I struggle to understand it
 any other way than you're saying there's something broken about
 Yannis' deployment. I mean, your reply wasn't even a standalone
 statement, but a continuation of Yannis' sentence. :-P
 That statement is correct though. As Google and Akamai IPv6 are
 currently broken, enabling IPv6 thus breaks connectivity to those sites.

 Not enabling IPv6 thus is a better option in such a situation.

 I'm afraid I don't see the supporting evidence here. From my point
 of view, Google and Akamai IPv6 both work just fine.

I have to say they both look a bit spotty from Honolulu right now, e.g.

C:\windows\system32ping -6 www.google.com http://www.google.com

Pinging www.google.com http://www.google.com
[2a00:1450:4009:80b::1013] with 32 bytes of data:
Destination host unreachable.
Destination host unreachable.
Destination host unreachable.
Reply from 2a00:1450:4009:80b::1013: time=376ms

Ping statistics for 2a00:1450:4009:80b::1013:
 Packets: Sent = 4, Received = 1, Lost = 3 (75% loss),
Approximate round trip times in milli-seconds:
 Minimum = 376ms, Maximum = 376ms, Average = 376ms

but that may be some other issue entirely.

 Brian



 I happen to be in Norway, just like Tore - but we are in different
 ASes and as far as I know we also use different Akamai and Google
 cache instances.

 No specific problems that I can see.

 Steinar Haug, AS 2116
  .
 




Re: Some very nice broken IPv6 networks at Google and Akamai

2014-11-09 Thread Jeroen Massar
On 2014-11-09 10:42, Daniel Austin wrote:
 Hi,
 
 I've been having terrible connectivity to Google via IPv6 the last few
 days (i'd even resorted to using Bing!), but can confirm it is working
 fine for me today.

It indeed is looking fine now for the Google problem. Now for Akamai to
use their weekend to debug their issue ;)

 Google did have some issues, look at the outage list.  They are resolved
 now:

   Damian Menscher dam...@google.com mailto:dam...@google.com 
 6:44 PM (3 hours ago)

 The issue with IPv6 access to Google should now be resolved.  Please let
 us know if you're still having problems.

The fun question of course is: what/why/how the heck happened?

Greets,
 Jeroen



Re: Some very nice broken IPv6 networks at Google and Akamai (Was: Some very nice IPv6 growth as measured by Google)

2014-11-09 Thread Tore Anderson
* Jeroen Massar

 On 2014-11-08 18:38, Tore Anderson wrote:
  Yannis: «We're enabling IPv6 on our CPEs»
  Jeroen: «And then getting broken connectivity to Google»
  
  I'm not a native speaker of English, but I struggle to understand it
  any other way than you're saying there's something broken about
  Yannis' deployment. I mean, your reply wasn't even a standalone
  statement, but a continuation of Yannis' sentence. :-P
 
 That statement is correct though. As Google and Akamai IPv6 are
 currently broken, enabling IPv6 thus breaks connectivity to those
 sites.

Only if Google and Akamai are universally broken, which does not seem
to have been the case. I tested Google from the RING at 23:20 UTC
yesterday:

redpilllinpro@redpilllinpro01:~$ ring-all -t 120 -n 0 'wget -q -6 --timeout=10 
-O /dev/null 
https://lh6.googleusercontent.com/-msg_m1V-b-Y/Ufo23yPxnXI/AMw/Mv5WbEC_xzc/w387-h688-no/13%2B-%2B1
  echo OK || echo FAILED'  | egrep '(OK|FAILED)$'| sort | uniq -c
10 FAILED
255 OK

And Akamai just now (10:30 UTC):

redpilllinpro@redpilllinpro01:~$ ring-all -t 120 -n 0 'wget -q -6 --header 
User-Agent: foo --timeout=10 -O /dev/null 
http://www.akamai.com/images/img/banners/entertainment-home-page-banner-932x251.jpg
  echo OK || echo FAILED'  | egrep '(OK|FAILED)$'| sort | uniq -c
10 FAILED
252 OK

The files I get are both plenty larger than 1500B. Note that (some of)
the FAILED might be explained by the RING-node in question having
generally defective IPv6 connectivity, so it doesn't have to be
Akamai/Google specific.

I'll investigate the failing nodes further and let you know if I find
something that points to Google/Akamai-specific problems.

 No, PMTUD is fine in both IPv4 and IPv6.
 
 What is broken is people wrongly recommending to break and/or
 filtering ICMP and thus indeed breaking PMTUD.

There's a critical mass of broken PMTUD on the internet (for whatever
reasons). It does not matter who's fault it is, the end result is the
same - the mechanism cannot be relied upon if you actually care about
service quality.

From where I'm sitting, Google is advertising me an IPv6 TCP MSS of
1386. That speaks volumes. I don't believe for a second that my local
Google cluster is on links with an MTU of 1434; the clamped TCP MSS must
have intentionally have been configured, and the only reason I can
think of to do so is to avoid PMTUD.

What works fine in theory sometimes fail operationally (cf. 6to4).
Insisting that there exists no problem because it's just everyone else
who keeps screwing it up doesn't change operational realities.

 I also have to note that in the 10+ years of having IPv6 we rarely saw
 PMTU issues, and if we did, contacting the site that was filtering
 fixed the issue.

Looking at it from the content side, users using IPv6 tunnels are in a
tiny, tiny minority, while still managing to be responsible for a
majority of trouble reports. Our stuff reacts to ICMPv6 PTBs, so it's
not *all* tunnel users that get in trouble at the same time, it's just
that they're suspectible to problems such as:

* Dropping ICMPv6 PTBs emitted by their CPE/tunnel ingress in their
  computer's personal/local firewall.
* The Internet tunnel ingress router rate-limiting ICMPv6
  generation. For example, Juniper has a hard 50pps ICMP generation
  limit per FPC, and at least one Cisco platform has 100/10 by
  default. Given enough traffic on the tunnel router, this limit will
  exceeded more or less continously. See the thread «MTU handling in 6rd
  deployments», btw.

Native users are immune against these problems, because they do not have
to use PMTUD.

 The two 'workarounds' you mention are all on the *USER* side (RA MTU)
 or in-network, where you do not know if the *USER* has a smaller MTU.

LAN RA MTU, yes. TCP MSS, no - it can be done in the ISP's tunnel
router.

 Hence touching it in the network is a no-no.

It appears to me that the ISPs that are deploying tunnels (6RD) for
their users consider these a yes-yes. Presumably because they've
realised that reducing reliance on PMTUD is in their customer's best
interest, as it gives the best user experience.

Is there *any* ISP in the world that does 6RD that does *not* do TCP MSS
clamping and/or reduced LAN RA MTUs? (Or, for that matter, does IPv4
through PPPoE and does not do TCP MSS clamping?)

For what it's worth, the vast majority of tunneled IPv6 traffic we see
comes from ISPs with 6RD, which generally works fine due to these
workarounds. Thankfully.

  «this must be a major issue for everybody using IPv6 tunnels»
  «MTU 1480 MSS 1220 = fix»
  «the 1480MTU and 1220MSS numbers worked for my pfsense firewall»
  «The only thing that worked here is 1280 MTU / 1220 MSS»
  «clamping the MSS to 1220 seems to have fixed the problem for me»
  «I changed the MSS setting [...] for the moment Google pages are
  loading much better»
  
  This is all perfectly consistent with common PMTUD mailfunctioning /
  tunnel suckage.
 
 NOTHING to do with tunnels, 

Re: Some very nice broken IPv6 networks at Google and Akamai (Was: Some very nice IPv6 growth as measured by Google)

2014-11-09 Thread Tore Anderson
* Nick Hilliard

 On 09/11/2014 11:00, Tore Anderson wrote:
  Only if Google and Akamai are universally broken, which does not
  seem to have been the case. I tested Google from the RING at 23:20
  UTC yesterday:
 
 did you do a control run on a known working site?

No. I feel that 250+ successes vs 10 failures is enough to conclude
that Akamai and Google are *not* universally broken, far from it. Thus
refuting the claim that «Google and Akamai IPv6 are currently broken,
enabling IPv6 thus breaks connectivity to those sites».

Whatever broke, it must have been much more local than that, or only
occurring under certain conditions (e.g., tunnels dependent on PMTUD).

 Not all ring nodes have working ipv6.

Exactly. That's a likely explanation for (some of) the 10 failures.

I redid the tests now, and the failing nodes were:

beanfield01.ring.nlnog.net
bluezonejordan01.ring.nlnog.net
claranet02.ring.nlnog.net
hosteam01.ring.nlnog.net
keenondots01.ring.nlnog.net
maxitel01.ring.nlnog.net
nicchile01.ring.nlnog.net
occaid01.ring.nlnog.net
popsc01.ring.nlnog.net
rackfish01.ring.nlnog.net
robtex01.ring.nlnog.net

Of these, only three were able to ping 2a02:c0::1 which I know should
respond fine. The other ones got various no route to host,
destination beyond scope of source, and stuff like that.

The three that had working IPv6 connectivity were:

hosteam01.ring.nlnog.net
nicchile01.ring.nlnog.net
occaid01.ring.nlnog.net

hosteam01 and occaid01 have defective local DNS, they can't resolve
anything it seems. So nothing to do with Google and Akamai there.

nicchile01 is the only one that looks interesting, as it works for
Google but not Akamai:

redpilllinpro@nicchile01:~$ wget -6 --header User-Agent: foo -O /dev/null 
http://www.akamai.com/images/img/banners/entertainment-home-page-banner-932x251.jpg
--2014-11-09 12:03:41--  
http://www.akamai.com/images/img/banners/entertainment-home-page-banner-932x251.jpg
Resolving www.akamai.com (www.akamai.com)... 2600:1419:7:185::22d9, 
2600:1419:7:189::22d9
Connecting to www.akamai.com (www.akamai.com)|2600:1419:7:185::22d9|:80... 
failed: Connection refused.
Connecting to www.akamai.com (www.akamai.com)|2600:1419:7:189::22d9|:80... 
failed: Connection refused.

However, tcpdump reveals that this isn't Akamai's doing, as it's
ICMP errors originating from a NIC Chile-owned IP address.

12:06:19.388093 IP6 2001:1398:32:177::40  2001:1398:3:120:200:1:120:28: ICMP6, 
destination unreachable, unreachable port, 2600:1419:7:185::22d9 tcp port 80, 
length 88
12:06:19.389095 IP6 2001:1398:32:177::40  2001:1398:3:120:200:1:120:28: ICMP6, 
destination unreachable, unreachable port, 2600:1419:7:189::22d9 tcp port 80, 
length 88

Perhaps they have firewalled out Akamai for some reason?

In any case. I summary I see *zero* evidence of ubiquitous IPv6
problems with Google and Akamai. So ISPs should not worry about
deploying IPv6, at least if they're doing it native and don't
expose themselves to PMTUD breakage.

Tore


Re: Some very nice broken IPv6 networks at Google and Akamai (Was: Some very nice IPv6 growth as measured by Google)

2014-11-09 Thread Jeroen Massar
On 2014-11-09 12:00, Tore Anderson wrote:
 * Jeroen Massar
 
 On 2014-11-08 18:38, Tore Anderson wrote:
 Yannis: «We're enabling IPv6 on our CPEs»
 Jeroen: «And then getting broken connectivity to Google»

 I'm not a native speaker of English, but I struggle to understand it
 any other way than you're saying there's something broken about
 Yannis' deployment. I mean, your reply wasn't even a standalone
 statement, but a continuation of Yannis' sentence. :-P

 That statement is correct though. As Google and Akamai IPv6 are
 currently broken, enabling IPv6 thus breaks connectivity to those
 sites.
 
 Only if Google and Akamai are universally broken, which does not seem
 to have been the case. I tested Google from the RING at 23:20 UTC
 yesterday:

And Google confirmed that they fixed something, we'll never really
know what they fixed though.

Your test was done from colocated hosts. While real people use access
networks.

Thus while such a test gives insight that some of it works, it does not
cover corner cases.


Also note that the Akamai problem (which still persists) is a random
one. Hence fetching one URL is just a pure luck thing if it works or
not. As a generic page has multiple objects though, you'll hit it much
quicker.


 No, PMTUD is fine in both IPv4 and IPv6.

 What is broken is people wrongly recommending to break and/or
 filtering ICMP and thus indeed breaking PMTUD.
 
 There's a critical mass of broken PMTUD on the internet (for whatever
 reasons). It does not matter who's fault it is, the end result is the
 same - the mechanism cannot be relied upon if you actually care about
 service quality.
 
 From where I'm sitting, Google is advertising me an IPv6 TCP MSS of
 1386. That speaks volumes. I don't believe for a second that my local
 Google cluster is on links with an MTU of 1434; the clamped TCP MSS must
 have intentionally have been configured, and the only reason I can
 think of to do so is to avoid PMTUD.
 
 What works fine in theory sometimes fail operationally (cf. 6to4).
 Insisting that there exists no problem because it's just everyone else
 who keeps screwing it up doesn't change operational realities.

I am not 'insisting' that there is no problem with PMTUD.

I am stating that the problem has to be fixed at the source, not hidden
in the network.


 I also have to note that in the 10+ years of having IPv6 we rarely saw
 PMTU issues, and if we did, contacting the site that was filtering
 fixed the issue.
 
 Looking at it from the content side, users using IPv6 tunnels are in a
 tiny, tiny minority, while still managing to be responsible for a
 majority of trouble reports.

Maybe as those users are more technically experienced and are able to
get their message out, while non-techie users just disable IPv6 as is
advised in a LOT of places? :)

[..]
 Native users are immune against these problems, because they do not have
 to use PMTUD.

You are forgetting the little fact that native is a really strange
word. Quite a few DSL deployments use PPPoE etc.

There are also a lot of native deployments out there that use 6rd.


Instead of just coming with TUNNELS SUCK@$!@#$%^!*@%! actually
Contact the networks that are broken and try to get them to fix the
problem. You might not want to fix those as it is not your problem, but
it is a problem for access networks.

Note btw that Google is not stating anything about the problem they had.
And Akamai, well, they are still digging.

Thus PMTUD might be an issue, might also be something else completely.

Without insight into those systems, one just has to guess.




 The two 'workarounds' you mention are all on the *USER* side (RA MTU)
 or in-network, where you do not know if the *USER* has a smaller MTU.
 
 LAN RA MTU, yes. TCP MSS, no - it can be done in the ISP's tunnel
 router.

Do you really suggest making the Internet have an MTU of 1280? :)

 Hence touching it in the network is a no-no.
 
 It appears to me that the ISPs that are deploying tunnels (6RD) for
 their users consider these a yes-yes. Presumably because they've
 realised that reducing reliance on PMTUD is in their customer's best
 interest, as it gives the best user experience.
 
 Is there *any* ISP in the world that does 6RD that does *not* do TCP MSS
 clamping and/or reduced LAN RA MTUs? (Or, for that matter, does IPv4
 through PPPoE and does not do TCP MSS clamping?)
 
 For what it's worth, the vast majority of tunneled IPv6 traffic we see
 comes from ISPs with 6RD, which generally works fine due to these
 workarounds. Thankfully.

Till people start using non-TCP protocols, and everything breaks.

Hence, don't hide the fact, instead fix it.

[..]
 That is indeed an assumption, as we can't see the Google/Akamai end of
 the connection.
 
 If you see failures on MTU=1500 links, I think there must be at least
 two distinct problems at play. When users report «MTU 1480 MSS 1220 =
 fix», then that is extremely indicative of a PMTUD problem.

For the Google case that was reported. 

Re: Some very nice broken IPv6 networks at Google and Akamai (Was: Some very nice IPv6 growth as measured by Google)

2014-11-09 Thread Job Snijders
On Sun, Nov 09, 2014 at 08:03:01PM +0100, Jeroen Massar wrote:
  No. I feel that 250+ successes vs 10 failures is enough to conclude
  that Akamai and Google are *not* universally broken, far from it.
 
 Testing from colod boxes on well behaved networks (otherwise they would
 not know or be part of the RING), while the problem lies with actual
 home users is quite a difference.

I can't comment on the validaty of the tests performed, but I'd like to
point out one thing: I like that the NLNOG RING is very diverse,
especially in terms of the node's IPv6 connectivity.

Some hosts are behind exotic 6to4 NATted tunnels, others behind regular
tunnels, some inadvertently block useful ICMPv6 messages, some networks
are just broken.

For NLNOG RING applications we mandate that there is 1 globally unique
IPv6 address on the host, we do not specify how this should be
accomplished. This leads to some variety, not all of those
implementations I would describe as well behaved.

Kind regards,

Job


Re: Some very nice broken IPv6 networks at Google and Akamai (Was: Some very nice IPv6 growth as measured by Google)

2014-11-09 Thread Tore Anderson
* Jeroen Massar

 Testing from colod boxes on well behaved networks (otherwise they
 would not know or be part of the RING), while the problem lies with
 actual home users is quite a difference.

So far you've been claiming that the problem lies with Google or
Akamai. If true - and I don't dispute that it is - then testing from
the RING should work just as well as from any home network.

And, as Job has pointed out, the RING nodes are not all «well behaved».

 Also the statement universally broken comes from you.

I refer to this blanket statement of yours, responding to my
paraphrasing you and Yannis:

«Yannis: «We're enabling IPv6 on our CPEs
Jeroen: «And then getting broken connectivity to Google»

You: «That statement is correct though. As Google and Akamai IPv6 are
currently broken, enabling IPv6 thus breaks connectivity to those
sites. Not enabling IPv6 thus is a better option in such a situation.»

In order for this to be correct, Google and Akamai must necessarily be
universally broken over IPv6.

If on the other hand the problem is not universal, but only occurring in
a certain corner cases (such as when hitting the cluster in Mexico
City, when client is behind a 1500B MTU link, or whatever), then
you have no reason to claim that ISPs in general (like OTE) will
break connectivity to Akamai and Google when they enable IPv6.

  Thus refuting the claim that «Google and Akamai IPv6 are currently
  broken, enabling IPv6 thus breaks connectivity to those sites».
 
 As Google has admitted fixing it, you have been proven wrong.

I don't dispute that there is or has been *a* problem, only the scope
of it.

The way I see it, most of the available data points to there indeed
being a problem specific to tunnels/PMTUD (which I've said all along,
cf. tunnels suck). Perhaps Google turned up a new cluster and forgot
to enable TCP MSS clamping or something like that. No idea about the
Akamai one.

 Actually, I wonder why you are trying to fight so hard that various
 people have reported this problem. You are apparently not working for
 either Google or Akamai, you are not an access network, your network
 is not involved either; hence... what is your problem with such a
 statement?

My problem is with your claim that «not enabling IPv6 thus is a better
option in such a situation».

Whatever the problem is or was, it did not affect everyone - most
likely it affected just a tiny fraction of users - otherwise I think we
would have heard way more complaints from all over. There are millions
of IPv6 users out there in the world, and without Google(+YouTube/GMail)
and Akamai(+Facebook), internet doesn't work.

With no more specifics known about what went wrong, ISPs have zero
reason to stall their IPv6 rollouts, since there is no reason to assume
that they will be impacted by the problem. So: OTE.gr, Telefonica.cz,
Telenor.no, Telepac.pt, and others - go go go!

BTW: Some of our customers are heavy users of Akamai for video
streaming, and many have lots of interaction with various Google
services. So I have plenty of reason to care about any problem of
theirs.

Tore


Re: Some very nice broken IPv6 networks at Google and Akamai (Was: Some very nice IPv6 growth as measured by Google)

2014-11-09 Thread Doug Barton

On 11/9/14 12:27 PM, Tore Anderson wrote:

So far you've been claiming that the problem lies with Google or
Akamai. If true - and I don't dispute that it is - then testing from
the RING should work just as well as from any home network.


No, that's not true at all. Eyeball networks have very different 
characteristics than colos. Sure there will be some overlap, but your 
statement above is demonstrably false.


It's also true that both Google and Akamai have admitted problems with 
IPv6, and Google claims to have fixed them. So at this point it's not at 
all clear what you're arguing about, other than an Asperger'y need to 
prove that something you said was correct at some point in some context. 
So can you please just let it go, and let's return this list back to its 
normally high S::N?


Doug



Re: Some very nice broken IPv6 networks at Google and Akamai

2014-11-09 Thread Jeroen Massar
On 2014-11-09 22:10, Lorenzo Colitti wrote:
 On Sat, Nov 8, 2014 at 11:48 PM, Jeroen Massar jer...@massar.ch
 mailto:jer...@massar.ch wrote:
 
  The issue with IPv6 access to Google should now be resolved.  Please 
 let
  us know if you're still having problems.
 
 The fun question of course is: what/why/how the heck happened?
 
 
 Another fun question is why folks are relying on PMTUD instead of
 adjusting their MTU settings (e.g., via RAs).

Because why would anybody want to penalize their INTERNAL network?

Does Google run non-1500 MTU internally? I hope you are running
JumboPackets at least internally (the ~9000 ones, not the 65k+ ones in
IPv6 thought ;)

Also, nobody knows if a packet somewhere in the middle of the path to
their destination will have a non-1500 MTU.

 But relying on PMTUD still
 imposes a 1-RTT penalty on every TCP connection to an IPv6 address you
 haven't talked to in the last few minutes. Why would you do that to your
 connection?

Because you can't know if that is always the case.

 As to what happened: what happened here is that some Google servers
 temporarily stopped doing MSS clamping. That was an outage, and AIUI it
 has since been fixed.

Thanks for admitting AND explaining what the problem is.

As you work at Google, ever heard of this QUIC protocol that does not
use TCP?

Maybe you want to ask your colleagues about that :)

 (Some parts of) Google infrastructure do not do
 PMTUD for the latency reasons above and for reasons similar to those
 listed
 in https://tools.ietf.org/html/draft-v6ops-jaeggli-pmtud-ecmp-problem-00 .

As such, you are ON PURPOSE breaking PMTUD, instead trying to fix it
with some other bandaid.

And thus you are hiding problems that will happen when QUIC actually
starts to get used?

Or are you going to just reset the Internet to 1280? :)

Greets,
 Jeroen



Re: Some very nice broken IPv6 networks at Google and Akamai (Was: Some very nice IPv6 growth as measured by Google)

2014-11-09 Thread Jeroen Massar
On 2014-11-09 21:27, Tore Anderson wrote:
 * Jeroen Massar
 
 Testing from colod boxes on well behaved networks (otherwise they
 would not know or be part of the RING), while the problem lies with
 actual home users is quite a difference.
 
 So far you've been claiming that the problem lies with Google or
 Akamai.

Google has acknowledged it, apparently they are doing MSS clamping on
*THEIR* side (while they don't know what your network looks like ;)

Akamai is still peeking.

 If true - and I don't dispute that it is - then testing from
 the RING should work just as well as from any home network.

Completely different environment.

 And, as Job has pointed out, the RING nodes are not all «well behaved».

You had 10 nodes that failed which demonstrates that.

As you have contact to these folks, ask them to fix that situation.

 Also the statement universally broken comes from you.
 
 I refer to this blanket statement of yours, responding to my
 paraphrasing you and Yannis:
 
 «Yannis: «We're enabling IPv6 on our CPEs
 Jeroen: «And then getting broken connectivity to Google»
 
 You: «That statement is correct though. As Google and Akamai IPv6 are
 currently broken, enabling IPv6 thus breaks connectivity to those
 sites. Not enabling IPv6 thus is a better option in such a situation.»
 
 In order for this to be correct, Google and Akamai must necessarily be
 universally broken over IPv6.

Why are you so hang up on words? While in your own admission you are not
a native speaker?

Can you please stop bickering over those words?

Google has admitted they broke something and fixed it.

Stop hanging yourself up.

Greets,
 Jeroen



Re: Some very nice broken IPv6 networks at Google and Akamai (Was: Some very nice IPv6 growth as measured by Google)

2014-11-09 Thread Jeroen Massar
On 2014-11-09 22:10, Tore Anderson wrote:
 * Jeroen Massar
 
 Also note that the Akamai problem (which still persists) is a random
 one. Hence fetching one URL is just a pure luck thing if it works or
 not. As a generic page has multiple objects though, you'll hit it much
 quicker.
 
 Hm. As I've said before - WFM. Any more information you could provide
 to help me try to reproduce it?

Try reading the links provided.

They contain the details that users have provided.

Note again: Google problem has been fixed (spoofing MSS is not
fixing the problem).

The Akamai things seems to still be in progress.


 I am not 'insisting' that there is no problem with PMTUD.
 
 «No, PMTUD is fine in both IPv4 and IPv6», you said...

Fine is not perfect.

Also, taking single sentences out of somebodies comment does make the
whole sentence.

I have stated several times that there ARE issues with PMTUD and that
people need to fix them instead of hide them.

Again, please stop getting hung up on words.

 I am stating that the problem has to be fixed at the source, not
 hidden in the network.
 
 In an ideal world, perhaps. It's like with 6to4; if all relay operators
 did a wonderful job, and no-one filtered proto-41, and nobody did
 NAT44, then 6to4 would just be hunky-dory. But it's just too much
 brokenness out there.
 
 Same with PMTUD. It's beyond repair, IMHO. The pragmatic thing is to
 accept that and move on.

What you are saying is to just stick to an MTU of 1280 and TCP
everything forgetting about ever being able to move to anything else
than using TCP.

As QUIC is deployed and HTTP/2 is coming, forget about that.

You will need to address these concerns properly.


[..]
 Or that the tunnel ingress routers rate-limit ICMPv6 error generation.

sixxsd does not have this problem. There are no rate limits.

Thus at least everybody behind SixXS tunnels will not have that issue.

Contact your vendor to resolve your problems.

 You are forgetting the little fact that native is a really strange
 word. Quite a few DSL deployments use PPPoE etc.

 There are also a lot of native deployments out there that use 6rd.
 
 In my experience, these ISPs deploy workarounds to avoid PMTUD. TCP MSS
 clamping, and LAN RA MTUs (for IPv6). That helps.

For TCP, not for anything else.

Chrome speaks QUIC to various Google properties.

 Instead of just coming with TUNNELS SUCK@$!@#$%^!*@%! actually
 Contact the networks that are broken and try to get them to fix the
 problem. You might not want to fix those as it is not your problem,
 but it is a problem for access networks.
 
 I think PMTUD on the internet is broken beyond salvation

Then please give up on it and let the rest of the world care about it
and notify folks and let them fix the problem properly.

Greets,
 Jeroen



Some very nice broken IPv6 networks at Google and Akamai (Was: Some very nice IPv6 growth as measured by Google)

2014-11-08 Thread Jeroen Massar
On 2014-11-08 18:38, Tore Anderson wrote:
 * Jeroen Massar
 
 The only link: they are all using IPv6.

 You are trying to make this OTE link. I have never stated anything
 like that. Though, you likely take that from the fact that the reply
 followed in that thread.
 
 Yannis: «We're enabling IPv6 on our CPEs»
 Jeroen: «And then getting broken connectivity to Google»
 
 I'm not a native speaker of English, but I struggle to understand it
 any other way than you're saying there's something broken about
 Yannis' deployment. I mean, your reply wasn't even a standalone
 statement, but a continuation of Yannis' sentence. :-P

That statement is correct though. As Google and Akamai IPv6 are
currently broken, enabling IPv6 thus breaks connectivity to those sites.

Not enabling IPv6 thus is a better option in such a situation.

But it was just a hook into it. Don't further worry about it.

 Anyway, I'm relieved to hear that there's no reason to supect IPv6
 breakage in OTE. As we host a couple of the top-10 Greek sites, one of
 which has IPv6, we're dependent on the big Greek eyeball network like
 OTE to not screw up their IPv6 deployment - it is *I* who get in trouble
 if they do. :-)

But your network was not involved in the above statement.

And if you monitor your sites correctly, also from non-native setups.
Then you should be fine.

 PMTUD is fine.

 What sucks is 'consultants' advising blocking ICMPv6 because that is
 what we do in IPv4 and that some hardware/software gets broken once
 in a while.
 
 PMTUD is just as broken in IPv4, too.

No, PMTUD is fine in both IPv4 and IPv6.

What is broken is people wrongly recommending to break and/or filtering
ICMP and thus indeed breaking PMTUD.

 PMTUD has *never* been «fine»,
 neither for IPv4 nor for IPv6. That's why everyone who provides links
 with MTUs  1500 resorts to workarounds such as TCP MSS clamping and

I am one of the people on this planet providing a LOT of links with
MTUs  1500 and we really will never resort to clamping MSS.

It does not fix anything. It only hides the problem and makes diagnosing
issues problematic as one does not know if that trick is being applied
or not.

I also have to note that in the 10+ years of having IPv6 we rarely saw
PMTU issues, and if we did, contacting the site that was filtering fixed
the issue.

 reducing MTU values in LAN-side RAs,

That is an even worse offender than MSS. Though at least visible in
tracepath6. Note that you are limiting packet sizes on your local
network because somewhere some person is filtering ICMP.

 so that reliance on PMTUD
 working is limited as much as possible. If you want to deliver an
 acceptable service (either as an ISP or as a content hoster), you just
 *can't* depend on PMTUD.

The two 'workarounds' you mention are all on the *USER* side (RA MTU) or
in-network, where you do not know if the *USER* has a smaller MTU.

Hence touching it in the network is a no-no.

 Even when PMTUD actually works as designed it sucks, as it causes
 latency before data may be successfully transmitted.

I fully agree with that statement. But this Internet thing is global and
one cannot enforce small or large packets on the world just because some
technologies do not support them.

Note that PMTUD is typically cached per destination. Unfortunately there
is no way for for a router to securely say this MTU is for the whole
/64 or /32 etc that would have been beneficial.

 See the threads I referenced, they are still in the above quoted text.

 Note that the Google case is consistent: (as good as) every IPv6
 connection breaks.

 The Akamai case is random: sometimes it just works as you hit good
 nodes in the cluster, sometimes it breaks.
 
 I see in the threads referenced things statements such as:
 
 «this must be a major issue for everybody using IPv6 tunnels»
 «MTU 1480 MSS 1220 = fix»
 «the 1480MTU and 1220MSS numbers worked for my pfsense firewall»
 «The only thing that worked here is 1280 MTU / 1220 MSS»
 «clamping the MSS to 1220 seems to have fixed the problem for me»
 «I changed the MSS setting [...] for the moment Google pages are
 loading much better»
 
 This is all perfectly consistent with common PMTUD mailfunctioning /
 tunnel suckage.

NOTHING to do with tunnels, everything to do with somebody not
understanding PMTUD and breaking it, be that on purpose or not.

Note that both Google and Akamai very well know about PMTUD and up to
about a week ago both where running perfectly fine.

 I'm therefore very sceptical that this problem would
 also be experienced by users with 1500 byte MTU links.

Tested failing also on MTU=1500 links

 (Assuming there's only a single problem at play here.)

That is indeed an assumption, as we can't see the Google/Akamai end of
the connection.

Note that in the Akamai case it is a random thing, it happens to random
nodes inside the cluster (at least, that is my assumption...)

 In both cases, it is hard to say what exactly breaks as only the
 people in those 

Re: Some very nice broken IPv6 networks at Google and Akamai

2014-11-08 Thread sthaug
  I'm not a native speaker of English, but I struggle to understand it
  any other way than you're saying there's something broken about
  Yannis' deployment. I mean, your reply wasn't even a standalone
  statement, but a continuation of Yannis' sentence. :-P
 
 That statement is correct though. As Google and Akamai IPv6 are
 currently broken, enabling IPv6 thus breaks connectivity to those sites.
 
 Not enabling IPv6 thus is a better option in such a situation.

I'm afraid I don't see the supporting evidence here. From my point
of view, Google and Akamai IPv6 both work just fine.

I happen to be in Norway, just like Tore - but we are in different
ASes and as far as I know we also use different Akamai and Google
cache instances. 

No specific problems that I can see.

Steinar Haug, AS 2116


Re: Some very nice broken IPv6 networks at Google and Akamai

2014-11-08 Thread Tarko Tikan

hey,


I'm afraid I don't see the supporting evidence here. From my point
of view, Google and Akamai IPv6 both work just fine.


Concur. Both work just fine from my POV and I don't see lower than usual 
IPv6 traffic levels.


--
tarko


Re: Some very nice broken IPv6 networks at Google and Akamai

2014-11-08 Thread Joe Hamelin
Google did have some issues, look at the outage list.  They are resolved
now:

Damian Menscher dam...@google.com
6:44 PM (3 hours ago)

The issue with IPv6 access to Google should now be resolved.  Please let us
know if you're still having problems.

--
Joe Hamelin, W7COM, Tulalip, WA, 360-474-7474

On Sat, Nov 8, 2014 at 7:09 PM, Brian E Carpenter 
brian.e.carpen...@gmail.com wrote:

 On 09/11/2014 09:19, sth...@nethelp.no wrote:
  I'm not a native speaker of English, but I struggle to understand it
  any other way than you're saying there's something broken about
  Yannis' deployment. I mean, your reply wasn't even a standalone
  statement, but a continuation of Yannis' sentence. :-P
  That statement is correct though. As Google and Akamai IPv6 are
  currently broken, enabling IPv6 thus breaks connectivity to those sites.
 
  Not enabling IPv6 thus is a better option in such a situation.
 
  I'm afraid I don't see the supporting evidence here. From my point
  of view, Google and Akamai IPv6 both work just fine.

 I have to say they both look a bit spotty from Honolulu right now, e.g.

 C:\windows\system32ping -6 www.google.com

 Pinging www.google.com [2a00:1450:4009:80b::1013] with 32 bytes of data:
 Destination host unreachable.
 Destination host unreachable.
 Destination host unreachable.
 Reply from 2a00:1450:4009:80b::1013: time=376ms

 Ping statistics for 2a00:1450:4009:80b::1013:
 Packets: Sent = 4, Received = 1, Lost = 3 (75% loss),
 Approximate round trip times in milli-seconds:
 Minimum = 376ms, Maximum = 376ms, Average = 376ms

 but that may be some other issue entirely.

 Brian


 
  I happen to be in Norway, just like Tore - but we are in different
  ASes and as far as I know we also use different Akamai and Google
  cache instances.
 
  No specific problems that I can see.
 
  Steinar Haug, AS 2116
  .