Re: Some very nice broken IPv6 networks at Google and Akamai
On 2014-11-11 16:00, Emanuel Popa wrote: Hi, Is there anyway to intentionally and immediately get on Google's DNS blacklist in order to avoid similar outages in the future affecting only IPv6 traffic? http://www.google.com/intl/en_ALL/ipv6/statistics/data/no_.txt Or maybe the smart thing to do is building another ISP controllable blacklist of broken domains and tell BIND on the caches to return only A records for blacklisted domains. Or the other way around: only records for IPv4 broken/blacklisted domains... As most modern clients do Happy Eyeballs, you could just null route the destination prefixes and see all clients fall back to IPv6. But it is rather evil to do that especially at an ISP level. Could have done that for SixXS and give people working stuff that way, but that would not have actually resolved the problem, just hidden it. If you expect that they have outages that they cannot quickly see or not, then you should also expect a blacklist like to be broken or not properly update. Hence, better to see the problems and to alert the folks so that they can fix these issues properly (though Google is now just hacking around with MSS clamping...). They typically do not have these issues, they just did not notice it this time around and thus it took a while for them to wake up (timezones :) figure out what it is and fix the issue. I am fairly confident though that Google is now monitoring their stuff correctly. Lots of good folks there, stuff breaks, they fix it. Greets, Jeroen
Re: Some very nice broken IPv6 networks at Google and Akamai
On 11/11/2014 15:00, Emanuel Popa wrote: Is there anyway to intentionally and immediately get on Google's DNS blacklist in order to avoid similar outages in the future affecting only IPv6 traffic? http://www.google.com/intl/en_ALL/ipv6/statistics/data/no_.txt Or maybe the smart thing to do is building another ISP controllable blacklist of broken domains and tell BIND on the caches to return only A records for blacklisted domains. Or the other way around: only records for IPv4 broken/blacklisted domains... ... or alternatively, depend on Google, Akamai and others not breaking. This is what we do for ipv4 and it normally works well, but not always. Bear it in mind that every time a hack is installed to work around a potential future problem, that hack needs maintenance and if it breaks, there's a chance that the resulting damage will be at least as bad as what it was seeking to avoid in the first place. Unless there are persistent reliability problems, hacks tend not to be worth it. Nick
Re: Some very nice broken IPv6 networks at Google and Akamai
On 2014-11-11 19:09, Andras Toth wrote: On Tue, Nov 11, 2014 at 3:36 PM, Jeroen Massar jer...@massar.ch mailto:jer...@massar.ch wrote: If you expect that they have outages that they cannot quickly see or not, then you should also expect a blacklist like to be broken or not properly update. Hence, better to see the problems and to alert the folks so that they can fix these issues properly (though Google is now just hacking around with MSS clamping...). [de-cloak] Google has been doing MSS clamping for a long time, I've seen this myself in packet captures and Lorenzo also confirmed in his email: ...some Google servers temporarily stopped doing MSS clamping. They do it for a good reason: to prevent PMTUD as it introduces delay and their customers (eyeballs) wouldn't like it. Lorenzo and others explained this too several times. That explanation was seen, at least for me, the first time in this thread. As stated, the MSS clamping is just hiding the real problems. It does not properly resolve anything. The world is not spinning around sixxs and your design ideas. Please turn off write-only. Wow, here come the ad-hominem attacks, stay in lurk mode if you can't handle people raising issues. If I had not commented about this problem, it would never have come to light... maybe in several years when nothing could have been done anymore. But today, we still can fix things. Please realize that the world does have a lot more users than SixXS. Noting problems and properly fixing them are important. Greets, Jeroen
Re: Some very nice broken IPv6 networks at Google and Akamai
Hi Lorenzo, Op 9 nov. 2014, om 22:10 heeft Lorenzo Colitti lore...@google.com het volgende geschreven: On Sat, Nov 8, 2014 at 11:48 PM, Jeroen Massar jer...@massar.ch wrote: The issue with IPv6 access to Google should now be resolved. Please let us know if you're still having problems. The fun question of course is: what/why/how the heck happened? Another fun question is why folks are relying on PMTUD instead of adjusting their MTU settings (e.g., via RAs). But relying on PMTUD still imposes a 1-RTT penalty on every TCP connection to an IPv6 address you haven't talked to in the last few minutes. Why would you do that to your connection? I guess most users wouldn't really notice a 1-RTT delay. But I agree that it is less than optimal that every outbound connection from a network behind a non-1500-MTU link has to suffer this penalty. Unfortunately the current choices seem to be to either limit the link MTU (and making traffic to e.g. the local NAS suffer as well) or suffer the 1-RTT penalty. As to what happened: what happened here is that some Google servers temporarily stopped doing MSS clamping. That was an outage, and AIUI it has since been fixed. (Some parts of) Google infrastructure do not do PMTUD for the latency reasons above and for reasons similar to those listed in https://tools.ietf.org/html/draft-v6ops-jaeggli-pmtud-ecmp-problem-00 . Thank you for the information. Great to have real data instead of guesses and speculation :) Cheers! Sander
Re: Some very nice broken IPv6 networks at Google and Akamai
Hi Philipp, Op 10 nov. 2014 om 21:09 heeft Philipp Kern pk...@debian.org het volgende geschreven: On Mon, Nov 10, 2014 at 07:36:22PM +0100, Sander Steffann wrote: I guess most users wouldn't really notice a 1-RTT delay. Depends on the RTT. In mobile networks it generally sucks. Good point :) Sander
Re: Some very nice broken IPv6 networks at Google and Akamai
Hi, I've been having terrible connectivity to Google via IPv6 the last few days (i'd even resorted to using Bing!), but can confirm it is working fine for me today. Thanks, Dan. On 09/11/2014 06:26, Joe Hamelin wrote: Google did have some issues, look at the outage list. They are resolved now: Damian Menscher dam...@google.com mailto:dam...@google.com 6:44 PM (3 hours ago) The issue with IPv6 access to Google should now be resolved. Please let us know if you're still having problems. -- Joe Hamelin, W7COM, Tulalip, WA, 360-474-7474 On Sat, Nov 8, 2014 at 7:09 PM, Brian E Carpenter brian.e.carpen...@gmail.com mailto:brian.e.carpen...@gmail.com wrote: On 09/11/2014 09:19, sth...@nethelp.no mailto:sth...@nethelp.no wrote: I'm not a native speaker of English, but I struggle to understand it any other way than you're saying there's something broken about Yannis' deployment. I mean, your reply wasn't even a standalone statement, but a continuation of Yannis' sentence. :-P That statement is correct though. As Google and Akamai IPv6 are currently broken, enabling IPv6 thus breaks connectivity to those sites. Not enabling IPv6 thus is a better option in such a situation. I'm afraid I don't see the supporting evidence here. From my point of view, Google and Akamai IPv6 both work just fine. I have to say they both look a bit spotty from Honolulu right now, e.g. C:\windows\system32ping -6 www.google.com http://www.google.com Pinging www.google.com http://www.google.com [2a00:1450:4009:80b::1013] with 32 bytes of data: Destination host unreachable. Destination host unreachable. Destination host unreachable. Reply from 2a00:1450:4009:80b::1013: time=376ms Ping statistics for 2a00:1450:4009:80b::1013: Packets: Sent = 4, Received = 1, Lost = 3 (75% loss), Approximate round trip times in milli-seconds: Minimum = 376ms, Maximum = 376ms, Average = 376ms but that may be some other issue entirely. Brian I happen to be in Norway, just like Tore - but we are in different ASes and as far as I know we also use different Akamai and Google cache instances. No specific problems that I can see. Steinar Haug, AS 2116 .
Re: Some very nice broken IPv6 networks at Google and Akamai
On 2014-11-09 10:42, Daniel Austin wrote: Hi, I've been having terrible connectivity to Google via IPv6 the last few days (i'd even resorted to using Bing!), but can confirm it is working fine for me today. It indeed is looking fine now for the Google problem. Now for Akamai to use their weekend to debug their issue ;) Google did have some issues, look at the outage list. They are resolved now: Damian Menscher dam...@google.com mailto:dam...@google.com 6:44 PM (3 hours ago) The issue with IPv6 access to Google should now be resolved. Please let us know if you're still having problems. The fun question of course is: what/why/how the heck happened? Greets, Jeroen
Re: Some very nice broken IPv6 networks at Google and Akamai (Was: Some very nice IPv6 growth as measured by Google)
* Jeroen Massar On 2014-11-08 18:38, Tore Anderson wrote: Yannis: «We're enabling IPv6 on our CPEs» Jeroen: «And then getting broken connectivity to Google» I'm not a native speaker of English, but I struggle to understand it any other way than you're saying there's something broken about Yannis' deployment. I mean, your reply wasn't even a standalone statement, but a continuation of Yannis' sentence. :-P That statement is correct though. As Google and Akamai IPv6 are currently broken, enabling IPv6 thus breaks connectivity to those sites. Only if Google and Akamai are universally broken, which does not seem to have been the case. I tested Google from the RING at 23:20 UTC yesterday: redpilllinpro@redpilllinpro01:~$ ring-all -t 120 -n 0 'wget -q -6 --timeout=10 -O /dev/null https://lh6.googleusercontent.com/-msg_m1V-b-Y/Ufo23yPxnXI/AMw/Mv5WbEC_xzc/w387-h688-no/13%2B-%2B1 echo OK || echo FAILED' | egrep '(OK|FAILED)$'| sort | uniq -c 10 FAILED 255 OK And Akamai just now (10:30 UTC): redpilllinpro@redpilllinpro01:~$ ring-all -t 120 -n 0 'wget -q -6 --header User-Agent: foo --timeout=10 -O /dev/null http://www.akamai.com/images/img/banners/entertainment-home-page-banner-932x251.jpg echo OK || echo FAILED' | egrep '(OK|FAILED)$'| sort | uniq -c 10 FAILED 252 OK The files I get are both plenty larger than 1500B. Note that (some of) the FAILED might be explained by the RING-node in question having generally defective IPv6 connectivity, so it doesn't have to be Akamai/Google specific. I'll investigate the failing nodes further and let you know if I find something that points to Google/Akamai-specific problems. No, PMTUD is fine in both IPv4 and IPv6. What is broken is people wrongly recommending to break and/or filtering ICMP and thus indeed breaking PMTUD. There's a critical mass of broken PMTUD on the internet (for whatever reasons). It does not matter who's fault it is, the end result is the same - the mechanism cannot be relied upon if you actually care about service quality. From where I'm sitting, Google is advertising me an IPv6 TCP MSS of 1386. That speaks volumes. I don't believe for a second that my local Google cluster is on links with an MTU of 1434; the clamped TCP MSS must have intentionally have been configured, and the only reason I can think of to do so is to avoid PMTUD. What works fine in theory sometimes fail operationally (cf. 6to4). Insisting that there exists no problem because it's just everyone else who keeps screwing it up doesn't change operational realities. I also have to note that in the 10+ years of having IPv6 we rarely saw PMTU issues, and if we did, contacting the site that was filtering fixed the issue. Looking at it from the content side, users using IPv6 tunnels are in a tiny, tiny minority, while still managing to be responsible for a majority of trouble reports. Our stuff reacts to ICMPv6 PTBs, so it's not *all* tunnel users that get in trouble at the same time, it's just that they're suspectible to problems such as: * Dropping ICMPv6 PTBs emitted by their CPE/tunnel ingress in their computer's personal/local firewall. * The Internet tunnel ingress router rate-limiting ICMPv6 generation. For example, Juniper has a hard 50pps ICMP generation limit per FPC, and at least one Cisco platform has 100/10 by default. Given enough traffic on the tunnel router, this limit will exceeded more or less continously. See the thread «MTU handling in 6rd deployments», btw. Native users are immune against these problems, because they do not have to use PMTUD. The two 'workarounds' you mention are all on the *USER* side (RA MTU) or in-network, where you do not know if the *USER* has a smaller MTU. LAN RA MTU, yes. TCP MSS, no - it can be done in the ISP's tunnel router. Hence touching it in the network is a no-no. It appears to me that the ISPs that are deploying tunnels (6RD) for their users consider these a yes-yes. Presumably because they've realised that reducing reliance on PMTUD is in their customer's best interest, as it gives the best user experience. Is there *any* ISP in the world that does 6RD that does *not* do TCP MSS clamping and/or reduced LAN RA MTUs? (Or, for that matter, does IPv4 through PPPoE and does not do TCP MSS clamping?) For what it's worth, the vast majority of tunneled IPv6 traffic we see comes from ISPs with 6RD, which generally works fine due to these workarounds. Thankfully. «this must be a major issue for everybody using IPv6 tunnels» «MTU 1480 MSS 1220 = fix» «the 1480MTU and 1220MSS numbers worked for my pfsense firewall» «The only thing that worked here is 1280 MTU / 1220 MSS» «clamping the MSS to 1220 seems to have fixed the problem for me» «I changed the MSS setting [...] for the moment Google pages are loading much better» This is all perfectly consistent with common PMTUD mailfunctioning / tunnel suckage. NOTHING to do with tunnels,
Re: Some very nice broken IPv6 networks at Google and Akamai (Was: Some very nice IPv6 growth as measured by Google)
* Nick Hilliard On 09/11/2014 11:00, Tore Anderson wrote: Only if Google and Akamai are universally broken, which does not seem to have been the case. I tested Google from the RING at 23:20 UTC yesterday: did you do a control run on a known working site? No. I feel that 250+ successes vs 10 failures is enough to conclude that Akamai and Google are *not* universally broken, far from it. Thus refuting the claim that «Google and Akamai IPv6 are currently broken, enabling IPv6 thus breaks connectivity to those sites». Whatever broke, it must have been much more local than that, or only occurring under certain conditions (e.g., tunnels dependent on PMTUD). Not all ring nodes have working ipv6. Exactly. That's a likely explanation for (some of) the 10 failures. I redid the tests now, and the failing nodes were: beanfield01.ring.nlnog.net bluezonejordan01.ring.nlnog.net claranet02.ring.nlnog.net hosteam01.ring.nlnog.net keenondots01.ring.nlnog.net maxitel01.ring.nlnog.net nicchile01.ring.nlnog.net occaid01.ring.nlnog.net popsc01.ring.nlnog.net rackfish01.ring.nlnog.net robtex01.ring.nlnog.net Of these, only three were able to ping 2a02:c0::1 which I know should respond fine. The other ones got various no route to host, destination beyond scope of source, and stuff like that. The three that had working IPv6 connectivity were: hosteam01.ring.nlnog.net nicchile01.ring.nlnog.net occaid01.ring.nlnog.net hosteam01 and occaid01 have defective local DNS, they can't resolve anything it seems. So nothing to do with Google and Akamai there. nicchile01 is the only one that looks interesting, as it works for Google but not Akamai: redpilllinpro@nicchile01:~$ wget -6 --header User-Agent: foo -O /dev/null http://www.akamai.com/images/img/banners/entertainment-home-page-banner-932x251.jpg --2014-11-09 12:03:41-- http://www.akamai.com/images/img/banners/entertainment-home-page-banner-932x251.jpg Resolving www.akamai.com (www.akamai.com)... 2600:1419:7:185::22d9, 2600:1419:7:189::22d9 Connecting to www.akamai.com (www.akamai.com)|2600:1419:7:185::22d9|:80... failed: Connection refused. Connecting to www.akamai.com (www.akamai.com)|2600:1419:7:189::22d9|:80... failed: Connection refused. However, tcpdump reveals that this isn't Akamai's doing, as it's ICMP errors originating from a NIC Chile-owned IP address. 12:06:19.388093 IP6 2001:1398:32:177::40 2001:1398:3:120:200:1:120:28: ICMP6, destination unreachable, unreachable port, 2600:1419:7:185::22d9 tcp port 80, length 88 12:06:19.389095 IP6 2001:1398:32:177::40 2001:1398:3:120:200:1:120:28: ICMP6, destination unreachable, unreachable port, 2600:1419:7:189::22d9 tcp port 80, length 88 Perhaps they have firewalled out Akamai for some reason? In any case. I summary I see *zero* evidence of ubiquitous IPv6 problems with Google and Akamai. So ISPs should not worry about deploying IPv6, at least if they're doing it native and don't expose themselves to PMTUD breakage. Tore
Re: Some very nice broken IPv6 networks at Google and Akamai (Was: Some very nice IPv6 growth as measured by Google)
On 2014-11-09 12:00, Tore Anderson wrote: * Jeroen Massar On 2014-11-08 18:38, Tore Anderson wrote: Yannis: «We're enabling IPv6 on our CPEs» Jeroen: «And then getting broken connectivity to Google» I'm not a native speaker of English, but I struggle to understand it any other way than you're saying there's something broken about Yannis' deployment. I mean, your reply wasn't even a standalone statement, but a continuation of Yannis' sentence. :-P That statement is correct though. As Google and Akamai IPv6 are currently broken, enabling IPv6 thus breaks connectivity to those sites. Only if Google and Akamai are universally broken, which does not seem to have been the case. I tested Google from the RING at 23:20 UTC yesterday: And Google confirmed that they fixed something, we'll never really know what they fixed though. Your test was done from colocated hosts. While real people use access networks. Thus while such a test gives insight that some of it works, it does not cover corner cases. Also note that the Akamai problem (which still persists) is a random one. Hence fetching one URL is just a pure luck thing if it works or not. As a generic page has multiple objects though, you'll hit it much quicker. No, PMTUD is fine in both IPv4 and IPv6. What is broken is people wrongly recommending to break and/or filtering ICMP and thus indeed breaking PMTUD. There's a critical mass of broken PMTUD on the internet (for whatever reasons). It does not matter who's fault it is, the end result is the same - the mechanism cannot be relied upon if you actually care about service quality. From where I'm sitting, Google is advertising me an IPv6 TCP MSS of 1386. That speaks volumes. I don't believe for a second that my local Google cluster is on links with an MTU of 1434; the clamped TCP MSS must have intentionally have been configured, and the only reason I can think of to do so is to avoid PMTUD. What works fine in theory sometimes fail operationally (cf. 6to4). Insisting that there exists no problem because it's just everyone else who keeps screwing it up doesn't change operational realities. I am not 'insisting' that there is no problem with PMTUD. I am stating that the problem has to be fixed at the source, not hidden in the network. I also have to note that in the 10+ years of having IPv6 we rarely saw PMTU issues, and if we did, contacting the site that was filtering fixed the issue. Looking at it from the content side, users using IPv6 tunnels are in a tiny, tiny minority, while still managing to be responsible for a majority of trouble reports. Maybe as those users are more technically experienced and are able to get their message out, while non-techie users just disable IPv6 as is advised in a LOT of places? :) [..] Native users are immune against these problems, because they do not have to use PMTUD. You are forgetting the little fact that native is a really strange word. Quite a few DSL deployments use PPPoE etc. There are also a lot of native deployments out there that use 6rd. Instead of just coming with TUNNELS SUCK@$!@#$%^!*@%! actually Contact the networks that are broken and try to get them to fix the problem. You might not want to fix those as it is not your problem, but it is a problem for access networks. Note btw that Google is not stating anything about the problem they had. And Akamai, well, they are still digging. Thus PMTUD might be an issue, might also be something else completely. Without insight into those systems, one just has to guess. The two 'workarounds' you mention are all on the *USER* side (RA MTU) or in-network, where you do not know if the *USER* has a smaller MTU. LAN RA MTU, yes. TCP MSS, no - it can be done in the ISP's tunnel router. Do you really suggest making the Internet have an MTU of 1280? :) Hence touching it in the network is a no-no. It appears to me that the ISPs that are deploying tunnels (6RD) for their users consider these a yes-yes. Presumably because they've realised that reducing reliance on PMTUD is in their customer's best interest, as it gives the best user experience. Is there *any* ISP in the world that does 6RD that does *not* do TCP MSS clamping and/or reduced LAN RA MTUs? (Or, for that matter, does IPv4 through PPPoE and does not do TCP MSS clamping?) For what it's worth, the vast majority of tunneled IPv6 traffic we see comes from ISPs with 6RD, which generally works fine due to these workarounds. Thankfully. Till people start using non-TCP protocols, and everything breaks. Hence, don't hide the fact, instead fix it. [..] That is indeed an assumption, as we can't see the Google/Akamai end of the connection. If you see failures on MTU=1500 links, I think there must be at least two distinct problems at play. When users report «MTU 1480 MSS 1220 = fix», then that is extremely indicative of a PMTUD problem. For the Google case that was reported.
Re: Some very nice broken IPv6 networks at Google and Akamai (Was: Some very nice IPv6 growth as measured by Google)
On Sun, Nov 09, 2014 at 08:03:01PM +0100, Jeroen Massar wrote: No. I feel that 250+ successes vs 10 failures is enough to conclude that Akamai and Google are *not* universally broken, far from it. Testing from colod boxes on well behaved networks (otherwise they would not know or be part of the RING), while the problem lies with actual home users is quite a difference. I can't comment on the validaty of the tests performed, but I'd like to point out one thing: I like that the NLNOG RING is very diverse, especially in terms of the node's IPv6 connectivity. Some hosts are behind exotic 6to4 NATted tunnels, others behind regular tunnels, some inadvertently block useful ICMPv6 messages, some networks are just broken. For NLNOG RING applications we mandate that there is 1 globally unique IPv6 address on the host, we do not specify how this should be accomplished. This leads to some variety, not all of those implementations I would describe as well behaved. Kind regards, Job
Re: Some very nice broken IPv6 networks at Google and Akamai (Was: Some very nice IPv6 growth as measured by Google)
* Jeroen Massar Testing from colod boxes on well behaved networks (otherwise they would not know or be part of the RING), while the problem lies with actual home users is quite a difference. So far you've been claiming that the problem lies with Google or Akamai. If true - and I don't dispute that it is - then testing from the RING should work just as well as from any home network. And, as Job has pointed out, the RING nodes are not all «well behaved». Also the statement universally broken comes from you. I refer to this blanket statement of yours, responding to my paraphrasing you and Yannis: «Yannis: «We're enabling IPv6 on our CPEs Jeroen: «And then getting broken connectivity to Google» You: «That statement is correct though. As Google and Akamai IPv6 are currently broken, enabling IPv6 thus breaks connectivity to those sites. Not enabling IPv6 thus is a better option in such a situation.» In order for this to be correct, Google and Akamai must necessarily be universally broken over IPv6. If on the other hand the problem is not universal, but only occurring in a certain corner cases (such as when hitting the cluster in Mexico City, when client is behind a 1500B MTU link, or whatever), then you have no reason to claim that ISPs in general (like OTE) will break connectivity to Akamai and Google when they enable IPv6. Thus refuting the claim that «Google and Akamai IPv6 are currently broken, enabling IPv6 thus breaks connectivity to those sites». As Google has admitted fixing it, you have been proven wrong. I don't dispute that there is or has been *a* problem, only the scope of it. The way I see it, most of the available data points to there indeed being a problem specific to tunnels/PMTUD (which I've said all along, cf. tunnels suck). Perhaps Google turned up a new cluster and forgot to enable TCP MSS clamping or something like that. No idea about the Akamai one. Actually, I wonder why you are trying to fight so hard that various people have reported this problem. You are apparently not working for either Google or Akamai, you are not an access network, your network is not involved either; hence... what is your problem with such a statement? My problem is with your claim that «not enabling IPv6 thus is a better option in such a situation». Whatever the problem is or was, it did not affect everyone - most likely it affected just a tiny fraction of users - otherwise I think we would have heard way more complaints from all over. There are millions of IPv6 users out there in the world, and without Google(+YouTube/GMail) and Akamai(+Facebook), internet doesn't work. With no more specifics known about what went wrong, ISPs have zero reason to stall their IPv6 rollouts, since there is no reason to assume that they will be impacted by the problem. So: OTE.gr, Telefonica.cz, Telenor.no, Telepac.pt, and others - go go go! BTW: Some of our customers are heavy users of Akamai for video streaming, and many have lots of interaction with various Google services. So I have plenty of reason to care about any problem of theirs. Tore
Re: Some very nice broken IPv6 networks at Google and Akamai (Was: Some very nice IPv6 growth as measured by Google)
On 11/9/14 12:27 PM, Tore Anderson wrote: So far you've been claiming that the problem lies with Google or Akamai. If true - and I don't dispute that it is - then testing from the RING should work just as well as from any home network. No, that's not true at all. Eyeball networks have very different characteristics than colos. Sure there will be some overlap, but your statement above is demonstrably false. It's also true that both Google and Akamai have admitted problems with IPv6, and Google claims to have fixed them. So at this point it's not at all clear what you're arguing about, other than an Asperger'y need to prove that something you said was correct at some point in some context. So can you please just let it go, and let's return this list back to its normally high S::N? Doug
Re: Some very nice broken IPv6 networks at Google and Akamai
On 2014-11-09 22:10, Lorenzo Colitti wrote: On Sat, Nov 8, 2014 at 11:48 PM, Jeroen Massar jer...@massar.ch mailto:jer...@massar.ch wrote: The issue with IPv6 access to Google should now be resolved. Please let us know if you're still having problems. The fun question of course is: what/why/how the heck happened? Another fun question is why folks are relying on PMTUD instead of adjusting their MTU settings (e.g., via RAs). Because why would anybody want to penalize their INTERNAL network? Does Google run non-1500 MTU internally? I hope you are running JumboPackets at least internally (the ~9000 ones, not the 65k+ ones in IPv6 thought ;) Also, nobody knows if a packet somewhere in the middle of the path to their destination will have a non-1500 MTU. But relying on PMTUD still imposes a 1-RTT penalty on every TCP connection to an IPv6 address you haven't talked to in the last few minutes. Why would you do that to your connection? Because you can't know if that is always the case. As to what happened: what happened here is that some Google servers temporarily stopped doing MSS clamping. That was an outage, and AIUI it has since been fixed. Thanks for admitting AND explaining what the problem is. As you work at Google, ever heard of this QUIC protocol that does not use TCP? Maybe you want to ask your colleagues about that :) (Some parts of) Google infrastructure do not do PMTUD for the latency reasons above and for reasons similar to those listed in https://tools.ietf.org/html/draft-v6ops-jaeggli-pmtud-ecmp-problem-00 . As such, you are ON PURPOSE breaking PMTUD, instead trying to fix it with some other bandaid. And thus you are hiding problems that will happen when QUIC actually starts to get used? Or are you going to just reset the Internet to 1280? :) Greets, Jeroen
Re: Some very nice broken IPv6 networks at Google and Akamai (Was: Some very nice IPv6 growth as measured by Google)
On 2014-11-09 21:27, Tore Anderson wrote: * Jeroen Massar Testing from colod boxes on well behaved networks (otherwise they would not know or be part of the RING), while the problem lies with actual home users is quite a difference. So far you've been claiming that the problem lies with Google or Akamai. Google has acknowledged it, apparently they are doing MSS clamping on *THEIR* side (while they don't know what your network looks like ;) Akamai is still peeking. If true - and I don't dispute that it is - then testing from the RING should work just as well as from any home network. Completely different environment. And, as Job has pointed out, the RING nodes are not all «well behaved». You had 10 nodes that failed which demonstrates that. As you have contact to these folks, ask them to fix that situation. Also the statement universally broken comes from you. I refer to this blanket statement of yours, responding to my paraphrasing you and Yannis: «Yannis: «We're enabling IPv6 on our CPEs Jeroen: «And then getting broken connectivity to Google» You: «That statement is correct though. As Google and Akamai IPv6 are currently broken, enabling IPv6 thus breaks connectivity to those sites. Not enabling IPv6 thus is a better option in such a situation.» In order for this to be correct, Google and Akamai must necessarily be universally broken over IPv6. Why are you so hang up on words? While in your own admission you are not a native speaker? Can you please stop bickering over those words? Google has admitted they broke something and fixed it. Stop hanging yourself up. Greets, Jeroen
Re: Some very nice broken IPv6 networks at Google and Akamai (Was: Some very nice IPv6 growth as measured by Google)
On 2014-11-09 22:10, Tore Anderson wrote: * Jeroen Massar Also note that the Akamai problem (which still persists) is a random one. Hence fetching one URL is just a pure luck thing if it works or not. As a generic page has multiple objects though, you'll hit it much quicker. Hm. As I've said before - WFM. Any more information you could provide to help me try to reproduce it? Try reading the links provided. They contain the details that users have provided. Note again: Google problem has been fixed (spoofing MSS is not fixing the problem). The Akamai things seems to still be in progress. I am not 'insisting' that there is no problem with PMTUD. «No, PMTUD is fine in both IPv4 and IPv6», you said... Fine is not perfect. Also, taking single sentences out of somebodies comment does make the whole sentence. I have stated several times that there ARE issues with PMTUD and that people need to fix them instead of hide them. Again, please stop getting hung up on words. I am stating that the problem has to be fixed at the source, not hidden in the network. In an ideal world, perhaps. It's like with 6to4; if all relay operators did a wonderful job, and no-one filtered proto-41, and nobody did NAT44, then 6to4 would just be hunky-dory. But it's just too much brokenness out there. Same with PMTUD. It's beyond repair, IMHO. The pragmatic thing is to accept that and move on. What you are saying is to just stick to an MTU of 1280 and TCP everything forgetting about ever being able to move to anything else than using TCP. As QUIC is deployed and HTTP/2 is coming, forget about that. You will need to address these concerns properly. [..] Or that the tunnel ingress routers rate-limit ICMPv6 error generation. sixxsd does not have this problem. There are no rate limits. Thus at least everybody behind SixXS tunnels will not have that issue. Contact your vendor to resolve your problems. You are forgetting the little fact that native is a really strange word. Quite a few DSL deployments use PPPoE etc. There are also a lot of native deployments out there that use 6rd. In my experience, these ISPs deploy workarounds to avoid PMTUD. TCP MSS clamping, and LAN RA MTUs (for IPv6). That helps. For TCP, not for anything else. Chrome speaks QUIC to various Google properties. Instead of just coming with TUNNELS SUCK@$!@#$%^!*@%! actually Contact the networks that are broken and try to get them to fix the problem. You might not want to fix those as it is not your problem, but it is a problem for access networks. I think PMTUD on the internet is broken beyond salvation Then please give up on it and let the rest of the world care about it and notify folks and let them fix the problem properly. Greets, Jeroen
Some very nice broken IPv6 networks at Google and Akamai (Was: Some very nice IPv6 growth as measured by Google)
On 2014-11-08 18:38, Tore Anderson wrote: * Jeroen Massar The only link: they are all using IPv6. You are trying to make this OTE link. I have never stated anything like that. Though, you likely take that from the fact that the reply followed in that thread. Yannis: «We're enabling IPv6 on our CPEs» Jeroen: «And then getting broken connectivity to Google» I'm not a native speaker of English, but I struggle to understand it any other way than you're saying there's something broken about Yannis' deployment. I mean, your reply wasn't even a standalone statement, but a continuation of Yannis' sentence. :-P That statement is correct though. As Google and Akamai IPv6 are currently broken, enabling IPv6 thus breaks connectivity to those sites. Not enabling IPv6 thus is a better option in such a situation. But it was just a hook into it. Don't further worry about it. Anyway, I'm relieved to hear that there's no reason to supect IPv6 breakage in OTE. As we host a couple of the top-10 Greek sites, one of which has IPv6, we're dependent on the big Greek eyeball network like OTE to not screw up their IPv6 deployment - it is *I* who get in trouble if they do. :-) But your network was not involved in the above statement. And if you monitor your sites correctly, also from non-native setups. Then you should be fine. PMTUD is fine. What sucks is 'consultants' advising blocking ICMPv6 because that is what we do in IPv4 and that some hardware/software gets broken once in a while. PMTUD is just as broken in IPv4, too. No, PMTUD is fine in both IPv4 and IPv6. What is broken is people wrongly recommending to break and/or filtering ICMP and thus indeed breaking PMTUD. PMTUD has *never* been «fine», neither for IPv4 nor for IPv6. That's why everyone who provides links with MTUs 1500 resorts to workarounds such as TCP MSS clamping and I am one of the people on this planet providing a LOT of links with MTUs 1500 and we really will never resort to clamping MSS. It does not fix anything. It only hides the problem and makes diagnosing issues problematic as one does not know if that trick is being applied or not. I also have to note that in the 10+ years of having IPv6 we rarely saw PMTU issues, and if we did, contacting the site that was filtering fixed the issue. reducing MTU values in LAN-side RAs, That is an even worse offender than MSS. Though at least visible in tracepath6. Note that you are limiting packet sizes on your local network because somewhere some person is filtering ICMP. so that reliance on PMTUD working is limited as much as possible. If you want to deliver an acceptable service (either as an ISP or as a content hoster), you just *can't* depend on PMTUD. The two 'workarounds' you mention are all on the *USER* side (RA MTU) or in-network, where you do not know if the *USER* has a smaller MTU. Hence touching it in the network is a no-no. Even when PMTUD actually works as designed it sucks, as it causes latency before data may be successfully transmitted. I fully agree with that statement. But this Internet thing is global and one cannot enforce small or large packets on the world just because some technologies do not support them. Note that PMTUD is typically cached per destination. Unfortunately there is no way for for a router to securely say this MTU is for the whole /64 or /32 etc that would have been beneficial. See the threads I referenced, they are still in the above quoted text. Note that the Google case is consistent: (as good as) every IPv6 connection breaks. The Akamai case is random: sometimes it just works as you hit good nodes in the cluster, sometimes it breaks. I see in the threads referenced things statements such as: «this must be a major issue for everybody using IPv6 tunnels» «MTU 1480 MSS 1220 = fix» «the 1480MTU and 1220MSS numbers worked for my pfsense firewall» «The only thing that worked here is 1280 MTU / 1220 MSS» «clamping the MSS to 1220 seems to have fixed the problem for me» «I changed the MSS setting [...] for the moment Google pages are loading much better» This is all perfectly consistent with common PMTUD mailfunctioning / tunnel suckage. NOTHING to do with tunnels, everything to do with somebody not understanding PMTUD and breaking it, be that on purpose or not. Note that both Google and Akamai very well know about PMTUD and up to about a week ago both where running perfectly fine. I'm therefore very sceptical that this problem would also be experienced by users with 1500 byte MTU links. Tested failing also on MTU=1500 links (Assuming there's only a single problem at play here.) That is indeed an assumption, as we can't see the Google/Akamai end of the connection. Note that in the Akamai case it is a random thing, it happens to random nodes inside the cluster (at least, that is my assumption...) In both cases, it is hard to say what exactly breaks as only the people in those
Re: Some very nice broken IPv6 networks at Google and Akamai
I'm not a native speaker of English, but I struggle to understand it any other way than you're saying there's something broken about Yannis' deployment. I mean, your reply wasn't even a standalone statement, but a continuation of Yannis' sentence. :-P That statement is correct though. As Google and Akamai IPv6 are currently broken, enabling IPv6 thus breaks connectivity to those sites. Not enabling IPv6 thus is a better option in such a situation. I'm afraid I don't see the supporting evidence here. From my point of view, Google and Akamai IPv6 both work just fine. I happen to be in Norway, just like Tore - but we are in different ASes and as far as I know we also use different Akamai and Google cache instances. No specific problems that I can see. Steinar Haug, AS 2116
Re: Some very nice broken IPv6 networks at Google and Akamai
hey, I'm afraid I don't see the supporting evidence here. From my point of view, Google and Akamai IPv6 both work just fine. Concur. Both work just fine from my POV and I don't see lower than usual IPv6 traffic levels. -- tarko
Re: Some very nice broken IPv6 networks at Google and Akamai
Google did have some issues, look at the outage list. They are resolved now: Damian Menscher dam...@google.com 6:44 PM (3 hours ago) The issue with IPv6 access to Google should now be resolved. Please let us know if you're still having problems. -- Joe Hamelin, W7COM, Tulalip, WA, 360-474-7474 On Sat, Nov 8, 2014 at 7:09 PM, Brian E Carpenter brian.e.carpen...@gmail.com wrote: On 09/11/2014 09:19, sth...@nethelp.no wrote: I'm not a native speaker of English, but I struggle to understand it any other way than you're saying there's something broken about Yannis' deployment. I mean, your reply wasn't even a standalone statement, but a continuation of Yannis' sentence. :-P That statement is correct though. As Google and Akamai IPv6 are currently broken, enabling IPv6 thus breaks connectivity to those sites. Not enabling IPv6 thus is a better option in such a situation. I'm afraid I don't see the supporting evidence here. From my point of view, Google and Akamai IPv6 both work just fine. I have to say they both look a bit spotty from Honolulu right now, e.g. C:\windows\system32ping -6 www.google.com Pinging www.google.com [2a00:1450:4009:80b::1013] with 32 bytes of data: Destination host unreachable. Destination host unreachable. Destination host unreachable. Reply from 2a00:1450:4009:80b::1013: time=376ms Ping statistics for 2a00:1450:4009:80b::1013: Packets: Sent = 4, Received = 1, Lost = 3 (75% loss), Approximate round trip times in milli-seconds: Minimum = 376ms, Maximum = 376ms, Average = 376ms but that may be some other issue entirely. Brian I happen to be in Norway, just like Tore - but we are in different ASes and as far as I know we also use different Akamai and Google cache instances. No specific problems that I can see. Steinar Haug, AS 2116 .