Re: Open source Netflow analysis for monitoring AS-to-AS traffic
On 27/03/24 01:04, Brian Knight via NANOG wrote: What's presently the most commonly used open source toolset for monitoring AS-to-AS traffic? I want to see with which ASes I am exchanging the most traffic across my transits and IX links. I want to look for opportunities to peer so I can better sell expansion of peering to upper management. … pmacct seems to be good at gathering Netflow, but doesn't seem to analyze data. I don't see any concise howto guides for setting this up for my purpose, however. pmacct will do what you want and it's not particularly difficult to set it up. For example, you can aggregate data into a database using: aggregate[in]: src_as,src_net,src_mask aggregate[out]: dst_as,dst_net,dst_mask Now you can issue SQL queries that tell you which ASes or prefixes you send/receive the most bits or packets to/from. Tore
Re: Reverse Traceroute
* Rolf Winter > If you would like to play with reverse traceroute, the easiest option > is to work with the client and use one of the public server instances > (https://github.com/HSAnet/reverse-traceroute/blob/main/ENDPOINTS). > If you would be willing to host a public server instance yourself, > please reach out to us. I suggest you get in touch with the fine folks at NLNOG RING and ask it they would be interested in setting this up on the 600+ RING nodes all over the world. See https://ring.nlnog.net/. Tore
Re: Rack rails on network equipment
* Andrey Khomyakov > Interesting tidbit is that we actually used to manufacture custom rails for > our Juniper EX4500 switches so the switch can be actually inserted from the > back of the rack (you know, where most of your server ports are...) and not > be blocked by the zero-U PDUs and all the cabling in the rack. Stock rails > didn't work at all for us unless we used wider racks, which then, in turn, > reduced floor capacity. > > As far as I know, Dell is the only switch vendor doing toolless rails so it's > a bit of a hardware lock-in from that point of view. Amen. I suspect that Dell is pretty much alone in realising that rack mount kits that require insertion/removal from the hot aisle is pure idiocy, since the rear of the rack tends to be crowded with cables, PDUs, and so forth. This might be due to Dell starting out as a server manufacturer. *All* rack-mount servers on the market are inserted into (and removed from) the cold aisle of the rack, after all. The reasons that make this the only sensible thing for servers apply even more so for data centre switches. I got so frustrated with this after having to remove a couple of decommissioned switches that I wrote a post about it a few years back: https://www.redpill-linpro.com/techblog/2019/08/06/rack-switch-removal.html Nowadays I employ various strategies to facilitate cold aisle installation/removal, such as: reversing the rails if possible, attaching only a single rack ear (for four-post mounted equipment) or installing rivet nuts directly in the rack ears (for shallow two-post mounted equipment). (Another lesson the data centre switch manufacturers could learn from the server manufacturers is to always include a BMC. I would *much rather* spend my serial console infrastructure budget on switches with built-in BMCs. That way I would get remote power control, IPMI Serial- Over-LAN and so on – all through a *single* Ethernet management cable.) Tore
Re: Scanning activity from 2620:96:a000::/48
* Dobbins, Roland > Scanning is part of the ‘background radiation’ of the Internet, and it’s > performed by various parties with varying motivations. Of necessity, IPv6 > scanning is likely to be more targeted (were your able to discern any rhyme > or reason behind the observed scanning patterns?). The pattern appears to be sending a bunch of ICMPv6 pings to a random adresses within the same /104. The last 24 bits of each destination address appears randomised in each ping request, that is. I don't know if they move on to another /104 after they were done with the first one and so forth. > iACLs, tACLs, CoPP, selective QoS for various ICMPv6 types/codes, et. al. > should be configured in such a manner that 600pps of anything can’t cause an > adverse impact to any network functions. Because actual bad actors are > unlikely to voluntarily stop, even when requested to do so. Clearly, and in this particular case my CP protections did their job successfully, fortunately, but that is kind of besides the point. What I am wondering, though, is if it is really should be considered okay for a good actor to launch what essentially amounts to an neighbour cache exhaustion DoS attack towards unrelated network operators (without asking first), just because bad actors might do the same. Tore
Scanning activity from 2620:96:a000::/48
A couple of hours after midnight UTC, the control plane policers for unresolved traffic on a couple of our CE routers started being clogged with ping-scanning activity from 2620:96:a000::/48, which belongs to «Internet Measurement Research (SIXMA)» according to ARIN. Excerpt of this traffic (anonymised on our end): 11:21:05.016914 IP6 2620:96:a000::10 > 2001:db8:1234::f5:7a69: ICMP6, echo request, seq 0, length 16 11:21:05.016929 IP6 2620:96:a000::10 > 2001:db8:1234::12:ba74: ICMP6, echo request, seq 0, length 16 11:21:05.060045 IP6 2001:db8:1234::3 > 2620:96:a000::10: ICMP6, destination unreachable, unreachable address 2001:db8:1234::e7:f473, length 64 11:21:05.060060 IP6 2001:db8:1234::3 > 2620:96:a000::7: ICMP6, destination unreachable, unreachable address 2001:db8:1234::d4:c4a3, length 64 11:21:05.060419 IP6 2001:db8:1234::3 > 2620:96:a000::7: ICMP6, destination unreachable, unreachable address 2001:db8:1234::42:198a, length 64 11:21:05.064464 IP6 2620:96:a000::10 > 2001:db8:1234::4a:d4cd: ICMP6, echo request, seq 0, length 16 11:21:05.079645 IP6 2620:96:a000::10 > 2001:db8:1234::63:b58d: ICMP6, echo request, seq 0, length 16 11:21:05.097337 IP6 2620:96:a000::10 > 2001:db8:1234::24:1038: ICMP6, echo request, seq 0, length 16 11:21:05.111091 IP6 2620:96:a000::7 > 2001:db8:1234::8f:a126: ICMP6, echo request, seq 0, length 16 11:21:05.124112 IP6 2001:db8:1234::3 > 2620:96:a000::7: ICMP6, destination unreachable, unreachable address 2001:db8:1234::e6:70fc, length 64 11:21:05.124417 IP6 2001:db8:1234::3 > 2620:96:a000::10: ICMP6, destination unreachable, unreachable address 2001:db8:1234::bf:ca18, length 64 11:21:05.137509 IP6 2620:96:a000::10 > 2001:db8:1234::12:f0df: ICMP6, echo request, seq 0, length 16 11:21:05.142614 IP6 2620:96:a000::7 > 2001:db8:1234::8f:9ec6: ICMP6, echo request, seq 0, length 16 While the CP policer did its job and prevented any significant operational impact, the traffic did possibly prevent/delay legitimate address resolution attempts as well as trigger loads of pointless address resolution attempts (ICMPv6 Neighbour Solicitations) towards the customer LAN. We just blocked the prefix at our AS border to get rid of that noise. Those ACLs are currently dropping packets at a rate of around 600 pps. I was just curious to hear if anyone else is seeing the same thing, and also whether or not people feel that this is an okay thing for this «Internet Measurement Research (SIXMA)» to do (assuming they are white-hats)? Tore
Re: Partial vs Full tables
* Michael Hare > I'm considering an approach similar to Tore's blog where at some > point I keep the full RIB but selectively populate the FIB. Tore, > care to comment on why you decided to filter the RIB as well? Not «as well», «instead». In the end I felt that running in production with the RIB and the FIB perpetually out of sync was too much of a hack, something that I would likely come to regret at a later point in time. That approach never made it out of the lab. For example, simple RIB lookups like «show route $dest» would not have given truthful answers, which would likely have confused colleagues. Even though we filter on the BGP sessions towards our transits, we still all get the routes in our RIB and can look them up explicitly we need to (e.g., in JunOS: «show route hidden $dest»). Tore
Re: Partial vs Full tables
* Saku Ytti > On Fri, 5 Jun 2020 at 11:23, Tore Anderson wrote: > > > Sure you can, you just ask them. (We did.) > > And is it the same now? Some Ytti didn't 'fix' the config last night? > Or NOS change which doesn't do conditional routes? Or they > misunderstood their implementation and it doesn't actually work like > they think it does. I personally always design my reliance to other > people's clue to be as little as operationally feasible. The way they answered the question showed that they had already considered this particular failure case and engineered their implementation accordingly. That is good enough for us. Incorrect origination of a default route is, after all, just one of the essentially infinite ways our transit providers can screw up our services. Therefore it would make no sense to me to entrust the delivery of our business critical packets to a transit provider, yet at the same time not trust them to originate a default route reliably. If we did not feel I could trust my transit provider, we would simply find another one. There are plenty to choose from. Tore
Re: Partial vs Full tables
* Saku Ytti > On Fri, 5 Jun 2020 at 10:48, Tore Anderson wrote: > > > We started taking defaults from our transits and filtering most of the > > DFZ over three years ago. No regrets, it's one of the best decisions we > > ever made. Vastly reduced both convergence time and CapEx. > > Is this verbatim? I do not understand this question, sorry. > you cannot know how the operator originates default Sure you can, you just ask them. (We did.) Tore
Re: Partial vs Full tables
* James Breeden > I come to NANOG to get feedback from others who may be doing this. We > have 3 upstream transit providers and PNI and public peers in 2 > locations. It'd obviously be easy to transition to doing partial > routes for just the peers, etc, but I'm not sure where to draw the > line on the transit providers. I've thought of straight preferencing > one over another. I've thought of using BGP filtering and community > magic to basically allow Transit AS + 1 additional AS (Transit direct > customer) as specific routes, with summarization to default for the > rest. I'm sure there are other thoughts that I haven't had about this > as well We started taking defaults from our transits and filtering most of the DFZ over three years ago. No regrets, it's one of the best decisions we ever made. Vastly reduced both convergence time and CapEx. Transit providers worth their salt typically include BGP communities you can use to selectively accept more-specific routes that you are interested in. You could, for example, accept routes learned by your transits from IX-es in in your geographic vicinity. Here's a PoC where we used communities to filter out all routes except for any routes learned by our primary transit provider anywhere in Scandinavia, while using defaults for everything else: https://www.redpill-linpro.com/sysadvent/2016/12/09/slimming-routing-table.html (Note that we went away from the RIB->FIB filtering approach described in the post, what we have in production is traditional filtering on the BGP sessions.) Tore
Re: [EXT] Re: rack rails
* Cummings, Chris > Now that you say that, I think you're right. I am referring specifically to > the EX4650 and they are the cheesy type where the rear half of the rail stays > screwed in to the rack and the front half of the rail is attached to the > switch. I assume it is the same on the QFX since they are very similar > platforms. Basically they are that annoying type between rack ears and > sliding rails where the device can separate completely from the rails. Looking at the documentation (linked below), it would appear the EX4650 has the exact same rack-mount kit as the EX4500 and EX4600 do. They all share the fundamental problem I'm taking about, namely that there are fixed mounting ears on the port side of the switch, which prevent removal through the cold aisle (assuming data centre/PSU-to-port airflow). The "sliders" are really just there to prevent the PSU end of the switch to sag. Tore https://www.juniper.net/documentation/en_US/release-independent/junos/information-products/topic-collections/hardware/ex-series/ex4650/quick-start-ex4650.pdf (page 6)
Re: [EXT] Re: rack rails
* Chuck Anderson > The point is that the switches need to be removable without empty > space above/below, and ideally from the rear side of the rack. By > having extending/sliding rails, you can lift out or drop in the switch > after you slide it out. Then you can remove the rails. > > With fixed rails, you can't get the switch out without bending the ear > part of the rails when there are PDUs and other stuff in the way. Not necessarily. Even sliding rails must be constructed in a way that facilitates removal through the cold aisle side of the rack. That's not a given. One example of sliding rails that unfortunately do *not* allow for removal that way is the Edge-Core RKIT-100G-SLIDE: https://www.redpill-linpro.com/techblog/2020/01/17/new-routers.html (Ctrl+F Bonus) Tore
Re: rack rails
* Luke Guillory > I've had gear that came with a small rear support shelf that didn't had to > the height, RGB Networks BNPs for example. I'm pretty sure we've used these > with the BNPs one on top of the other. > > Page 16 in this PDF shows the shelf. > > http://www.konturm.ru/catalogy/df/bnp2xr_installation_guide_3.7.1_20160222.pdf Interesting, thanks! Such a shelf would do the trick if it is thin enough to fit in the tiny space between two devices mounted in adjacent rack units. Do you know if it is possible to buy this kind of shelf from somewhere (without an accompanying device)? Tore
Re: rack rails
* David Funderburk > 2 - Do you know of any universal rail kits for 1U, 2U and 3U servers, > routers, switches that work well? The brand names are nice but expensive. > Thought I'd explore some cheaper options first. We use a lot of MikroTik, HP, > Dell and some CISCO with a few other things here and there. When it comes to network equipment meant for mounting in four-post data centre racks with PSU-to-port airflow, the included kits are usually anything but nice. The problem is that they typically only allow for insertion/removal through the rear of the rack (unlike servers, which are almost exclusively mounted through the front of the rack). When a rack has been filled up, removal/insertion through the rear will often be essentially impossible due to cables, vertical PDUs and stuff like that that gets in the way. Explained in pictures here: https://www.redpill-linpro.com/techblog/2019/08/06/rack-switch-removal.html If someone knows of a generic rack mount kit for data centre switches that allows for insertion/removal through the front of the rack, i.e. from/to the cold aisle, I'd be very grateful. Best thing I've come up with so far is to use shelves, but that doubles the amount of rack units I need to use (1U switch sitting on top of a 1U shelf...) Tore
Re: Dual Homed BGP
* Baldur Norddahl > If you join any peering exchanges, full tables will be mandatory. Some > parties will export prefixes and then expect a more specific prefix received > from your transit to override a part of the space received via the peering. That would be a fundamentally flawed expectation, in my opinion. An AS that that advertises a prefix to its peers must be prepared to carry traffic to that entire prefix via that peering circuit. There is simply no guarantee that a more-specific prefix advertised somewhere else will make it into the RIBs and FIBs of all the peers of that AS. The AS might of course opt to do so anyway for traffic engineering purposes, but there is no assurance that it will actually work 100% of the time. When it doesn't, the AS in question would need to carry the traffic from the peering circuit across their own backbone. If the AS in question for some reason cannot do so, it would need to adjust its advertisements across the peering circuit so as to avoid falsely advertising reachability to unreachable destinations. Tore
Re: FYI - Suspension of Cogent access to ARIN Whois
* David Guo via NANOG > Good News! But we still received several spams from Cogent for our RIPE and > APNIC ASNs. If you are an EU/EEA citizen, you may object to their use of your personal information for marketing purposes (or for any purpose at all), as well as request erasure. (Note: these rights do not extend to impersonal role addresses like n...@example.com or hostmas...@example.com.) According to https://www.cogentco.com/en/cogent-gdpr/data-privacy, this should be done by sending e-mail to datapriv...@cogentco.com. There is no circumstance in which a company can legally refuse an objection to processing of personal information for marketing purposes. Therefore, should they refuse (or claim compliance but continue to spam you), you have standing to file a complaint with your national data protection agency. A DPA is competent to levy fines for violations of the GDPR of up to €20M or 4% of annual global revenue, so there is a certain incentive to respect such objections. (It might be that citizens of California have similar rights under the CCPA, which came into force last week.) Tore
Re: ECN
* Saku Ytti > Not true. Hash result should indicate discreet flow, more importantly > discreet flow should not result into two unique hash numbers. Using > whole TOS byte breaks this promise and thus breaks ECMP. > > Platforms allow you to configure which bytes are part of hash > calculation, whole TOS byte should not be used as discreet flow SHOULD > have unique ECN bits during congestion. Toke has diagnosed the problem > correctly, solution is to remove TOS from ECMP hash calculation. Agreed. This also goes for the other bits, so whole byte must be excluded. For example, the OpenSSH client will by default change the code point from zero (during authentication) to af21/cs1 (when it enters a interactive/non-interactive session). I have experienced this break IPv6 SSH sessions to an anycasted SSH server instance that was reached through old Juniper DPC cards with ECMP enabled. Symptom was that authentication went fine, only for the connection to be reset immediately after (unless default IPQoS config was changed). The «solution» was to simply disable ECMP for all IPv6 traffic, since I could not figure out how to make the Juniper exclude the DiffServ byte from the ECMP hash calculation. Tore
Re: Couple of questions about "baremetal/ONIE" networking equipment sellers
* Nick ten Cate > We also have lots of experience with FS.com switches; however.. One thing we > noticed really quick is that its better to order 1 and to find the actual > supplier and order with them directly. FS.com is a reseller; and they will > switch (no pun intended) supplier almost yearly. Real technical support is > nonexistent (even though they claim it is great) and I have yet to have a > single bug fixed; packet dumps and steps to reproduce included. I have > removed all of our *N*5850-48S6Q due to bugs in software lockups. Hi Nick, FS.com did indeed replace their N5850-58S6Q supplier a while back. It is rather idiotic of them to not change their SKU when they do so. Anyway, before it was manufactured by Celestica I think, now it is the Edge-Core AS5812-54X. The latter is very well supported by Cumulus, the former is not. You can see it is the the Edge-Core by comparing the pictures: https://www.fs.com/de-en/products/69226.html https://www.edge-core.com/productsInfo.php?cls=1=8=59=119 We bought a few of them. I did mail our AM before placing the order to ascertain that they would indeed deliver the AS5812-54X and to make it crystal clear that no other model would be accepted. No problem. They will also sell other Edge-Core models that's not (yet) on their website catalogue if you ask (we ordered a few AS7326-56Xes). I do not believe Edge-Core will sell direct to end-users, so resellers like FS, Cumulus Networks or HPE is your best bet if you want those. Tore
Re: BGP router question
* Art Stephens > Hope this is not too off topic but can any one advise if a Dell S4048-ON can > support full ebgp routes? As others have mentioned, you won't be able to program them all in the forwarding plane, but the control plane can receive them all just fine (it has more than enough RAM). If your use case allows for accepting a default route from your IP transit providers along with the full feed, you can easily implement control plane policies that ensure that what gets installed to the forwarding plane is only the routes to the destinations you care the most about + the default route to cover the long tail of traffic to the rest of the world. You can use the S4048-ON (or any equivalent layer-3 capable data centre switch) as a border router this way, at a fraction of what a big C or J router would cost you. We started doing this a few years back and we're not regretting it. https://labs.spotify.com/2016/01/27/sdn-internet-router-part-2/ https://www.redpill-linpro.com/sysadvent/2016/12/09/slimming-routing-table.html Tore
Re: Gi Firewall for mobile subscribers
* Mark Milhollan > On Thu, 11 Apr 2019, Tore Anderson wrote: > >> We've been wanting to replace our all of our ad-hoc OOB links with a >> standardised setup based on LTE connectivity to an embedded >> login/console server at each PoP. IPv6 would be perfect due to no >> CGNAT and infinitesimal levels of background scanning. >> >> Unfortunately Telenor has decided to deploy a central firewall that >> drops all inbound connections, making their service totally unusable >> for our use case. I guess they don't want our money. > > Sounds like the console server will need to "phone home". That a workaround > might be possible doesn't make a firewall which the user cannot control to > some degree less annoying. Though it might be that Telenor just needs to be > notified/reminded that power users and business customers exist. Phoning home is not an option here, as the whole point is to have an OOB backdoor that works even if «home» is totally FUBAR. For that reason it needs to be completely independent of the production network. Standard Internet connections are perfect, IFF they are bi-directional. Tore
Re: Gi Firewall for mobile subscribers
* Owen DeLong > What would be the process for a subscriber who wishes to allow inbound > connections? > > If you are simply saying that as a customer of your ISP you simply can’t > allow inbound IPv6 connections at all, then you are becoming a very poor > substitute for an ISP IMHO. I have to agree with this. We've been wanting to replace our all of our ad-hoc OOB links with a standardised setup based on LTE connectivity to an embedded login/console server at each PoP. IPv6 would be perfect due to no CGNAT and infinitesimal levels of background scanning. Unfortunately Telenor has decided to deploy a central firewall that drops all inbound connections, making their service totally unusable for our use case. I guess they don't want our money. Maybe with EU RLAH I could simply find another more suitable provider abroad. Maybe I'd even get vPLMN redundancy that way. Hmm... Tore
Re: ICMPv6 "too-big" packets ignored (filtered ?) by Cloudflare farms
* Jean-Daniel Pauget > I confess using IPv6 behind a 6in4 tunnel because the "Business-Class" > service > of the concerned operator doesn't handle IPv6 yet. > > as such, I realised that, as far as I can figure, ICMPv6 packet "too-big" > (rfc 4443) > seem to be ignored or filtered at ~60% of ClouFlare's http farms > > as a result, random sites such as http://nanog.org/ or > https://www.ansible.com/ > are badly reachable whenever small mtu are involved ... Hi Jean-Daniel. If you're using using tunnels you'll want to have your tunnel endpoint adjust down the TCP MSS value to match the MTU of the tunnel interface. That way, you'll avoid problems with Path MTU Discovery. Even in those situations where PMTUD does work fine, doing TCP MSS adjustment will improve performance as the server does not need to spend an RTT to discover your reduced MTU. (This isn't really an IPv6 issue, by the way - ISPs using PPPoE will typically perform MSS adjustment for IPv4 packets too.) If you're using Linux as your tunnel endpoint, try: ip6tables -A FORWARD -p tcp --tcp-flags SYN,RST SYN -j TCPMSS --clamp-mss-to-pmtu Tore
Re: BGP Experiment
* Job Snijders > Given the severity of the bug, there is a strong incentive for people to > upgrade ASAP. The buggy code path can also be disabled without upgrading, by building FRR with the --disable-bgp-vnc configure option, as I understand it. I've been told that this is the default in Cumulus Linux. Tore
Re: Most peered AS per country
* Mehmet Akcin > I am noticing provider A enters market X saying they are tier 1 network but > they do not have a si ngle peering session in country and they backhaul > everything back to market Z where they deliver traffic to the peer via high > latency and low performance method. This is causing market to receive pricing > targets which are unrealistic and hurting telecoms who are genuinely trying > to do right thing and establish in country direct peering with peers. Yeah, don't fall for the marketing hyperbole. A transit provider's «tier» is an extremely poor indicator of its interconnectedness and quality, especially if your traffic is regional in nature. In most cases you'll be much better off buying your IP transit from a regional «tier-2» provider, which tends to give you much better connectivity to other networks in your region - in addition to all the global connectivity that the «tier-2»'s upstream(s) provide, of course. Tore
Re: China ’s Maxim – Leave No Access Point Unexploited: The Hidden Story of China Telecom’ s BGP Hijacking
* Harley H > Curious to hear others' thoughts on this. > https://scholarcommons.usf.edu/cgi/viewcontent.cgi?article=1050=mca > > This paper presents the view that several BGP hijacks performed by China > Telecom had malicious intent. The incidents are: > * Canada to Korea - 2016 > * US to Italy - Oct 2016 > * Scandinavia to Japan - April-May 2017 > * Italy to Thailand - April-July 2017 > > The authors claim this is enabled by China Telecom's presence in North > America. Hi, I looked a bit into the Scandinavia to Japan claim last week for a Norwegian journalist, who obviously found this rather sensational claim very intriguing. The article (Norwegian, but Google Translate does a decent job) is found at https://www.digi.no/artikler/internettrafikk-fra-norge-og-sverige-ble-kapret-og-omdirigert-til-kina/449797?key=vS1EOiG1 in case you're interested. >From what I can tell from looking at routeviews data from the period, what happened was that SK Broadband (AS9318) was leaking a bunch of routes to China Telecom (AS4134). The leak included the transit routes from SKB's upstream Verizon (AS703) and customers of theirs in turn, including well- known organisations such as Bloomberg (AS10361) and Time Warner (AS36032), which I suppose might be the ones the paper is referring to. The routes in question then propagated from CT to Telia Carrier (AS1299), probably in North America somewhere. Scandinavia is TC's home turf, it makes sense that the detour via CT was easily observed from here. If you want to see for yourself, look for «1299 4134 9318 703» in http://archive.routeviews.org/route-views.linx/bgpdata/2017.04/RIBS/rib.20170430.2200.bz2 Anyway, in my opinion the data for this particular incident (I haven't looked into the other three) does not indicate foul play on CT's behalf, but rather a pretty standard leak by SKB followed by sloppy filtering by CT and TC both. Tore
Re: Cloudflare 1.1.1.1 public DNS different as path info for 1.0.0.1 and 1.1.1.1 london
* Marty Strong via NANOG> Routing from ~150 locations, plenty of redundancy. Any plans to support NSID and/or "hostname.bind" to allow clients to identify which node is serving their requests? For example: $ dig @nsb.dnsnode.net. hostname.bind. CH TXT +nsid [...] ;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 4096 ; NSID: 73 34 2e 6f 73 6c ("s4.osl") ;; QUESTION SECTION: ;hostname.bind. CH TXT ;; ANSWER SECTION: hostname.bind. 0 CH TXT "s4.osl" [...] Tore
Re: Xbox Live and Teredo
* Martin List-Petersen> Your best bet: set up a Terredo gateway and facilitate these Xboxes as > long as you don't give them native IPv6. This is unlikely to help, as the XB1 doesn't use Teredo relays at all. The XB1 uses Teredo to facilitate direct p2p communication between IPv4 consoles only. Essentially it is used an IPv4 NAT traversal mechanism. Its Teredo implementation does not allow communication between IPv4 and IPv6 peers. This is the only communication pattern which would normally require a third-party Teredo relay. This unfortunately also means that provisioning IPv6 is also unlikely to help, unless you're in a position to provision it to both peers. See: https://www.ietf.org/proceedings/88/slides/slides-88-v6ops-0.pdf Personally I'd start out by verifying the connectivity to and functionality of Microsoft's Teredo servers, which are used for NAT address discovery and port mapping during tunnel setup (unlike Teredo relays, Teredo servers aren't part of the Teredo «forwarding plane»). Tore
Re: BGP peering question
* craig washington > Newbie question, what criteria do you look for when you decide that > you want to peer with someone or if you will accept peering with > someone from an ISP point of view. Routing hygiene. I expect the would-be peer to keep the number of advertised routes that are either 1) not registered in RIPE/RADB, 2) disaggregated, or 3) redundant (i.e., more-specifics of larger advertisements) to an absolute minimum. Tore
Re: difference with caching when connected to telia internet
Hi Aaron, > What happened was, when I turned up my new 10 gig Telia Internet > connection a few days ago, I needed to balance out my (4) 10 gig > internet connections so I chopped up a /17 into (4) /19's. When I > did this, I was still advertising the /17 to my local caches, but I > was advertising the (4) /19's , one on each of my (4) 10 gig internet > connections. So the caches out on the public internet were learning > more specific prefixes (longer masks) then my local caches were > learning... so the caches on the internet were being used instead of > my local caches. Once google and Netflix tech support helped to make > me aware of this, I correctly sent the additional (4) /19's to my > caches and now all is well. Please instead advertise the /17 on all your Telia uplinks. You should *additionally* advertise the four /19s to the different links, but make sure to tag them with the NO_EXPORT community so they don't propagate outside Telia. That way you get the traffic engineering you want (i.e., load balancing of ingress traffic from Telia), while at the same time avoiding coming across as a self-serving jerk to everyone else on the Internet by not polluting their routing tables/FIBs with four entirely superfluous /19s. Tore
Re: External BGP Controller for L3 Switch BGP routing
* Saku Ytti > On 16 January 2017 at 14:36, Tore Anderson <t...@fud.no> wrote: > > > Put it another way, my «Internet facing» interfaces are typically > > 10GEs with a few (kilo)metres of dark fibre that x-connects into my > > IP-transit providers' routers sitting in nearby rooms or racks > > (worst case somewhere else in the same metro area). Is there any > > reason why I should need deep buffers on those interfaces? > > Imagine content network having 40Gbps connection, and client having > 10Gbps connection, and network between them is lossless and has RTT of > 200ms. To achieve 10Gbps rate receiver needs 10Gbps*200ms = 250MB > window, in worst case 125MB window could grow into 250MB window, and > sender could send the 125MB at 40Gbps burst. > This means the port receiver is attached to, needs to store the 125MB, > as it's only serialising it at 10Gbps. If it cannot store it, window > will shrink and receiver cannot get 10Gbps. > > This is quite pathological example, but you can try with much less > pathological numbers, remembering TridentII has 12MB of buffers. I totally get why the receiver need bigger buffers if he's going to shuffle that data out another interface with a slower speed. But when you're a data centre operator you're (usually anyway) mostly transmitting data. And you can easily ensure the interface speed facing the servers can be the same as the interface speed facing the ISP. So if you consider this typical spine/leaf data centre network topology (essentially the same one I posted earlier this morning): (Server) --10GE--> (T2 leaf X) --40GE--> (T2 spine) --40GE--> (T2 leaf Y) --10GE--> (IP-transit/"the Internet") --10GE--> (Client) If I understand you correctly you're saying this is a "suspect" topology that cannot achieve 10G transmission rate from server to client (or from client to server for that matter) because of small buffers on my "T2 leaf Y" switch (i.e., the one which has the Internet-facing interface)? If so would it solve the problem just replacing "T2 leaf Y" with, say, a Juniper MX or something else with deeper buffers? Or would it help to use (4x)10GE instead of 40GE for the links between the leaf and spine layers too, so there was no change in interface speeds along the path through the data centre towards the handoff to the IPT provider? Tore
Re: External BGP Controller for L3 Switch BGP routing
* Saku Ytti> Why I said it won't be a problem inside DC, is because low RTT, which > means small bursts. I'm talking about backend network infra in DC, not > Internet facing. Anywhere where you'll see large RTT and > speed/availability step-down you'll need buffers (unless we change TCP > to pace window-growth, unlike burst what it does now, AFAIK, you could > already configure your Linux server to do pacing at estimate BW, but > then you'd lose in congested links, as more aggressive TCP stack would > beat you to oblivion). But here you're talking about the RTT of each individual link, right, not the RTT of the entire path through the Internet for any given flow? Put it another way, my «Internet facing» interfaces are typically 10GEs with a few (kilo)metres of dark fibre that x-connects into my IP-transit providers' routers sitting in nearby rooms or racks (worst case somewhere else in the same metro area). Is there any reason why I should need deep buffers on those interfaces? The IP-transit providers might need the deep buffers somewhere in their networks, sure. But if so I'm thinking that's a problem I'm paying them to not have to worry about. BTW, in my experience the buffering and tail-dropping is actually a bigger problem inside the data centre because of distributed applications causing incast. So we get workarounds like DCTCP and BBR, which is apparently cheaper than using deep-buffer switches everywhere. Tore
Re: External BGP Controller for L3 Switch BGP routing
Hi Saku, > > https://www.redpill-linpro.com/sysadvent/2016/12/09/slimming-routing-table.html > > --- > As described in a prevous post, we’re testing a HPE Altoline 6920 in > our lab. The Altoline 6920 is, like other switches based on the > Broadcom Trident II chipset, able to handle up to 720 Gbps of > throughput, packing 48x10GbE + 6x40GbE ports in a compact 1RU chassis. > Its price is in all likelihood a single-digit percentage of the price > of a traditional Internet router with a comparable throughput rating. > --- > > This makes it sound like small-FIB router is single-digit percentage > cost of full-FIB. Do you know of any traditional «Internet scale» router that can do ~720 Gbps of throughput for less than 10x the price of a Trident II box? Or even <100kUSD? (Disregarding any volume discounts.) > Also having Trident in Internet facing interface may be suspect, > especially if you need to go from fast interface to slow or busy > interface, due to very minor packet buffers. This obviously won't be > much of a problem in inside-DC traffic. Quite the opposite, changing between different interface speeds happens very commonly inside the data centre (and most of the time it's done by shallow-buffered switches using Trident II or similar chips). One ubiquitous configuration has the servers and any external uplinks attached with 10GE to leaf switches which in turn connects to a 40GE spine layer with. In this config server<->server and server<->Internet packets will need to change speed twice: [server]-10GE-(leafX)-40GE-(spine)-40GE-(leafY)-10GE-[server/internet] I suppose you could for example use a couple of MX240s or something as a special-purpose leaf layer for external connectivity. MPC5E-40G10G-IRB or something towards the 40GE spines and any regular 10GE MPC towards the exits. That way you'd only have one shallow-buffered speed conversion remaining. But I'm very sceptical if something like this makes sense after taking the cost/benefit ratio into account. Tore
Re: Advertising rented IPv4 prefix from a different ASN.
* Mark Tinka > On 5/Aug/16 15:40, Soon Keat Neo wrote: > > > If you are just announcing more specific address space that you've > > obtained legitimately off their assigned address space, it should > > be no problem, just obtain an LoA and register it on the different > > databases and you should be set to ask your upstreams to allow the > > announcements. > > Do people actually do this? Just as an example: There are hundreds of more-specifics coming out of 8/8 that has a different origin AS than 8/8 itself, so yes, people do. Tore
Re: MTU
* Baldur Norddahl > I did not say we were doing internet peering... Uhm. When you say that you peer with another ISP (and keep in mind what the "I" in ISP stands for), while giving no further details, then folks are going to assume that you're talking about a standard eBGP peering with inet/inet6 unicast NLRIs. > In case you are wondering, we are actually running L2VPN tunnels over > MPLS. Okay. Well, I see no reason why using GRE tunnels for this purpose shouldn't work, it does for us (using mostly VPLS and Martini tunnels). That said, I've never tried extending our MPLS backbone outside of our own administrative domain or autonomous system. That sounds like a really scary prospect to me, but I'll admit I've never given serious consideration to such an arrangement before. Hopefully you know what you're doing. Tore
Re: MTU
* Baldur Norddahl > What is best practice regarding choosing MTU on transit links? > > Until now we have used the default of 1500 bytes. I now have a > project were we peer directly with another small ISP. However we need > a backup so we figured a GRE tunnel on a common IP transit carrier > would work. We want to avoid the troubles you get by having an > effective MTU smaller than 1500 inside the tunnel, so the IP transit > carrier agreed to configure a MTU of 9216. You use case as described above puzzles me. You should already your peer's routes being advertised to you via the transit provider and vice versa. If your direct peering fails, the traffic should start flowing via the transit provider automatically. So unless there's something else going on here you're not telling us there should be no need for the GRE tunnel. That said, it should work, as long as the MTU is increased in both ends and the transit network guarantees it will transports the jumbos. We're doing something similar, actually. We have multiple sites connected with either dark fibre or DWDM, but not always in a redundant fashion. So instead we run GRE tunnels through transit (with increased MTU) between selected sites to achieve full redundancy. This has worked perfectly so far. It's only used for our intra-AS IP/MPLS traffic though, not for eBGP like you're considering. > Obviously I only need to increase my MTU by the size of the GRE > header. But I am thinking is there any reason not to go all in and > ask every peer to go to whatever max MTU they can support? My own > equipment will do MTU of 9600 bytes. I'd say it's not worth the trouble unless you know you're going to use it for anything. If I was your peer I'd certainly need you to give me a good reason why I should deviate from my standard templates first... > On the other hand, none of my customers will see any actual difference > because they are end users with CPE equipment that expects a 1500 > byte MTU. Trying to deliver jumbo frames to the end users is probably > going to end badly. Depends on the end user, I guess. Residential? Agreed. Business? Who knows - maybe they would like to run fat GRE tunnels through your network? In any case: 1500 by default, other values only by request. Tore
Re: IPv6 Deployment for Mobile Subscribers
* Baldur Norddahl > Den 22. jul. 2016 20.25 skrev "Ca By": > > > Phones, as in 3gpp? If so, each phone alway gets a /64, there is > > no choice. > > > > https://tools.ietf.org/html/rfc6459 > > Here the cell companies are marketing their 4G LTE as an alternative > to DSL, Coax and fiber for internet access in your home with a 4G > wifi router. If they can not do prefix delegation it is no > alternative! Actually, that /64 prefix is delegated, after a fashion. RFC 7278. That said, according to RFC 6459 section 5.3, full DHCPv6-PD support was specified in 3GPP Rel-10. Not sure if there are production deployments of that yet though, and if not how far off they are. But at least it looks like it's coming. Tore
Re: IPv6 deployment excuses
* Mark Tinka> What I was trying to get to is that, yes, running a single-stack is > cheaper (depending on what "cheaper" means to you) than running > dual-stack. Wholeheartedly agreed. > That said, running IPv4-only means you put yourself at a disadvantage > as IPv6 is now where the world is going. Also wholeheartedly agreed. > Similarly, running IPv6-only means you still need to support access to > the IPv4-only Internet anyway, if you want to have paying customers or > happy users. > > So the bottom line is that for better or worse, any progressive > network in 2016 is going to have to run dual-stack in some form or > other for the foreseeable future. So the argument on whether it is > cheaper or more costly to run single- or dual-stack does not change > that fact if you are interested in remaining a going concern. My point is that as a content provider, I only need dual-stacked façade. That can easily be achieved using, e.g., protocol translation at the outer border of my network. The inside of my network, where 99.99% of all the complexity, devices, applications and so on reside, can be single stack IPv6-only today. Thus I get all the benefits of running a single stack network, minus a some fraction of a percent needed to operate the translation system. (I could in theory get rid of that too by outsourcing it somewhere.) Tore
Re: IPv6 deployment excuses
* Mark Tinka > I understand your points - to your comment, my question is around > whether it is cheaper (for you) to just run IPv6 in lieu of IPv6 and > IPv4. We've found that it is. IPv6-only greatly reduces complexity compared to dual stack. This means higher reliability, lower OpEx, shorter recovery time when something does go wrong anyway, fewer SLA violations, happier customers, and so on - the list goes on and on. Single stack is essentially the KISS option. It also means that we'll essentially never have to perform IPv4 renumbering exercises in order to accomodate for growth. Those tend to be very costly due to the man-hours required for planning and implementation. Besides, it means we don't need IPv4 to number customer infrastructure. As you probably know, IPv4 numbers have a real cost these days. My point of view is ASP/MSP/data centre stuff. I know I'm not alone in going down the IPv6 road here, though. Facebook is another prominent example. Other operators in different market segments are also doing IPv6-only. Kabel Deutschland and T-Mobile US, for example. I'm guessing they have similar motivations. Tore
Re: Netflix VPN detection - actual engineer needed
* Davide Davini> On 04/06/2016 20:46, Owen DeLong wrote: > > Get your own /48 and advertise to HE Tunnel via BGP. Problem > > solved. > > Even though that sounds like an awesome idea it does not seem trivial > to me to obtain your own /48. Which is a good thing, as every new PI /48 advertised to the DFZ will bloat the routing tables of thousands upon thousands of routers world wide. It might solve the Netflix problem, but what has actually happened is that you've split the original problem into a thousand small bits and thrown one piece into each of your neighbours' gardens. I'd encourage everyone to try to fix their Netflix problem a more proper way before deciding to litter everyone else's routing tables with another PI prefix. Blocking access to Netflix via the tunnel seems like an obvious solution to me, for what it's worth. I wonder if anyone has attempted to estimate approx. how much RIB/FIB space a single DFZ route requires in total across the entire internet... Tore
Re: Netflix VPN detection - actual engineer needed
* Spencer Ryan > As an addendum to this and what someone said earlier about the > tunnels not being anonymous: From Netflix's perspective they are. Yes > HE knows who controls which tunnel, but if Netflix went to HE and > said "Tell me what user has x/48" HE would say "No". Thus, making > them an effective anonymous VPN service from Netflix's perspective. Every ISP would say «No» to that question. In sane juridstictions only law enforcement has any chance of getting that answer (hopefully only if they have a valid mandate from some kind of court). But Netflix shouldn't have any need to ask in the first place. Their customers need to log in to their own personal accounts in order to access any content, when they do Netflix can discover their addresses. Tore
Re: Public DNS64
* Mark Andrews > In message <20160601103707.7de9d...@envy.e5.y.home>, Tore Anderson writes: > > Or you could simply accept that active sessions are torn down > > whenever the routing topology changes enough to flip traffic to the > > anycast prefix to another NAT64 instance in a different region. > > > > It would be no different from any other anycasted service. > > But some services are inherently short lived. NAT64 has no such > property. Well, yes - it depends on the service/application, right? That is, anycasted_${service} will work pretty much the same as ${service}_via_anycasted_nat64 for most values of ${service}. Assuming that: 1) most of your customer's sessions are short-lived and/or their applications can handle failures reasonably gracefully, and/or 2) you have a stable and well-designed network where you can be reasonably certain that the traffic from clients in city/region/country X is going to consistently be routed to the NAT64 instance in city/region/country X: ...you will have very little to gain by setting up some complicated NAT64 session replication scheme to city/region/country Y, Z, and so on. KISS: Just use different IPv4 source address pools in each location and accept that any long-lived sessions are interrupted when your routing turns really wonky once in a blue moon. If on the other hand you cannot under any circumstance accept disruption to existing sessions, you probably don't want to be using any form of NAT in the first place. It's not like anycast routes flipping is the only reason why sessions through a NAT can be disrupted. In that case, native IPv6 is probably better, or possibly MAP if you have no control over the (presumably IPv4-only) remote ends of those sessions. Tore
Re: Public DNS64
* Baldur Norddahl > It goes to the USA and back again. They would need NAT64 servers in > every region and then let the DNS64 service decide which one is close > to you by encoding the region information in the returned IPv6 > address. Such as 2001:470:64:[region number]::/96. > > An anycast solution would need a distributed NAT64 implementation, > such that the NAT64 servers could somehow synchronize state. Or you could simply accept that active sessions are torn down whenever the routing topology changes enough to flip traffic to the anycast prefix to another NAT64 instance in a different region. It would be no different from any other anycasted service. Tore
Re: IPV6 planning
* Saku Ytti > Yes, SLAAC, 4862 clearly does not forbid it, and there is no > technical reason. But as you state, 2464 does not specify other > behaviour. Writing new draft which specifies behaviour for arbitrary > size wouldn't be a challenge, marketing it might be. FYI: RFC 7421 is an in-depth discussion of the fixed 64-bit boundary. Tore
Re: The IPv6 Travesty that is Cogent's refusal to peer Hurricane Electric - and how to solve it
William, > Don't get me wrong. You can cure this fraud without going to extremes. > An open peering policy doesn't require you to buy hardware for the > other guy's convenience. Let him reimburse you or procure the hardware > you spec out if he wants to peer. Nor do you have to extend your > network to a location convenient for the other guy. Pick neutral > locations where you're willing to peer and let the other guy build to > them or pay you to build from there to him. Nor does an open peering > policy require you to give the other guy a free ride on your > international backbone: you can swap packets for just the regions of > your network in which he's willing to establish a connection. But not > ratios and traffic minimums -- those are not egalitarian, they're > designed only to exclude the powerless. > > Taken in this context, the Cogent/HE IPv6 peering spat is very simple: > Cogent is -the- bad actor. 100%. I'm curious: How do you know that Cogent didn't offer to peer under terms such as the ones you mention, but that those were refused by HE? Tore
Re: The IPv6 Travesty that is Cogent's refusal to peer Hurricane Electric - and how to solve it
* Ca By> Selling a service that is considered internet but does not deliver > full internet access is generally considered properly bad. > > I would not do business with either company, since neither of them > provide a full view. +1 Both networks are in a position to easily remedy the situation if they were pragmatically inclined. For example, Cogent could simply accept HE's offer to peer; HE could simply pick up Cogent's IPv6 routes from their existing transit provider TSIC. Instead they both choose to continue their game of chicken to the detriment of both of their customer bases. Fortunately there's no shortage of competitors to HE and Cogent who prioritise providing connectivity higher than engaging in such nonsense. Vote with your wallets, folks. Tore
Re: Another Big day for IPv6 - 10% native penetration
* Sander Steffann> > We just need Google to announce that IPv6 enabled sites will get a > > slight bonus in search rankings. And just like that, there will > > suddenly be a business reason to implement IPv6. > > I already discussed that with them a long time ago, but they weren't > convinced. Maybe now is the time to discuss it again :) I've mentioned this in other forums before, but I might as well repeat it here too: I can understand that Google (or Netflix for that matter) are reluctant to engage in pure IPv6 activism by providing different or improved content to users which have no IPv6 connectivity. However, maybe they'd be more open to the idea if it was limited to IPv6 clients only? That is, IFF the Google user submitting the search is doing it using IPv6, then consider the result entries' IPv6 availability when sorting the result set. My reasoning is that there would be an objective techincal reason for doing it. The client is demonstrably capable of using IPv6 and prefers to do so, and as it has been shown that IPv6 performs better than IPv4 (see e.g. https://youtu.be/_7rcAIbvzVY), giving priority to IPv6-enabled results seems a logical thing to do. Much in the same way that it makes sense to rank mobile-optimised sites high in result sets returned to mobile clients. I'd imagine that the promise of improved Google ratings for 10%/25% of global/U.S. users will still be a significant enough business reason for web site operators to seriously consider implementing IPv6. Tore
Re: Production-scale NAT64
* Mark Tinka mark.ti...@seacom.mu On 27/Aug/15 07:16, Mark Andrews wrote: Or why you are looking at NAT64 instead of DS-Lite, MAP-E, or MAP-T all of which are better solutions than NAT64. NAT64 + DNS64 which breaks DNSSEC. Because with NAT64/DNS64/464XLAT, there isn't any undo work after the dust settles. Hi Mark, There's not much difference between 464XLAT and MAP-*/DS-Lite/lw4o6 in this respect, the way I see it. In all cases you need four things: 0) Native IPv6. 1) A central component connected to the IPv4 internet and the IPv6. access network (464XLAT: PLAT, MAP-*: BR, DS-Lite/lw4o6: AFTR) 2) Signalling to client that #1 exists and can be used (464XLAT: DNS64, others: DHCPv6 options). 3) A distributed component at the customer premise/nodes that acts on #2 and connects an isolated IPv4 network to the IPv6 access network (464XLAT: CLAT, MAP-*: CE, DS-Lite/lw4o6: B4). The necessary undo work in all cases is to disable #2. At that point components #1 and #3 will become un-used and can be removed if you care. My guess is that you'll care about removing #1 because it probably uses power and space in your PoP, but that you won't care about #3 because that's just an unused software function residing in a customer device you might not even have management access to. I'll grant you that with NAT64/DNS64 *without* 464XLAT there is no #3 to remove as part of your undo work, but as I mentioned above I doubt you'll care about that particular distinction. Besides, since a CLAT is included by default in multiple client platforms, you can't really prevent your users from using 464XLAT if you're providing NAT64/DNS64 to begin with, unless you're doing something really weird like disabling DNS64 for the ipv4only.arpa. hostname specifically. Tore
Re: Production-scale NAT64
Hi Mark, * Mark Tinka mark.ti...@seacom.mu In our deployment, we do not offer customers private IPv4 addresses. I suppose we can afford to do this because a) we still have lots of public IPv4, b) we are not a mobile carrier. So any of our customers with IPv4 will never hit the NAT64 gateway. When we do run out of public IPv4 addresses (and cannot get anymore from AFRINIC), all new customers will be assigned IPv6 addresses. Why wait until then? Any particular reason why you cannot already today provide IPv6 addresses to your [new] customers in parallel with IPv4? Tore
Re: Production-scale NAT64
* William Herrin On Thu, Aug 20, 2015 at 1:22 PM, Ca By cb.li...@gmail.com wrote: On Thu, Aug 20, 2015 at 9:36 AM, William Herrin b...@herrin.us wrote: Seriously though, if you want to run a v6-only network and still support access to IPv4 Internet resources, consider 464XLAT or DS-Lite. NAT64 is a required component of 464XLAT. Sort of, technically, but not really. Yes really. See below. 464XLAT does not require DNS64 and provides client software with an IPv4 interface. IPv4 software that has no idea IPv6 exists sends IPv4 packets which get translated to IPv6 packets. Those packets are routed to the carrier NAT box which then translates these specially crafted IPv6 packets back to IPv4 packets. What do you think the «carrier NAT box» in 464XLAT is, exactly? No need to guess, we can check the 464XLAT specification: http://tools.ietf.org/html/rfc6877#section-2 PLAT: PLAT is provider-side translator (XLAT) that complies with [RFC6146]. It translates N:1 global IPv6 addresses to global IPv4 addresses, and vice versa. Let's check that reference: http://tools.ietf.org/html/rfc6146#section-1 This document specifies stateful NAT64, a mechanism for IPv4-IPv6 transition and IPv4-IPv6 coexistence. Lo and behold! Your 464XLAT «carrier NAT box» (a.k.a. «PLAT») *is* a NAT64 box. Thus, if you intend to deploy 464XLAT in production, you'll going to need a production scale NAT64 implementation. To answer the Jawaid's original question, I'm very happy with Jool (http://jool.mx) for my NAT64 (and SIIT) needs, which is a open-source Linux-based software solution. It has no problems handling several Gb/s of traffic using a couple of years old x86 server without any tuning, so if the capacity required is moderate this might be a cost-effective alternative to a dedicated boxes from the one of the router/network appliance vendors. Tore
Re: Remember Internet-In-A-Box?
* Owen DeLong o...@delong.com On Jul 15, 2015, at 08:57 , Matthew Kaufman matt...@matthew.at wrote: This is only true for dual-stacked networks. I just tried to set up an IPv6-only WiFi network at my house recently, and it was a total fail due to non-implementation of relatively new standards... starting with the fact that my Juniper SRX doesn't run a load new enough to include RDNSS information in RAs, and some of the devices I wanted to test with (Android tablets) won't do DHCPv6. That’s a pretty old load then, as I’ve had RDNSS on my SRX-100 for several years now. Interesting. Which JUNOS version are you running, exactly? According to Juniper's web site, RDNSS support showed up in JUNOS 14.1, which isn't available for the SRX series (nor is any later version). http://www.juniper.net/techpubs/en_US/junos15.1/topics/reference/configuration-statement/dns-server-address-edit-protocols-router-advertisement.html Tore
Re: NTP versions in production use?
* Julien Goodwin Juniper have recently (15.1, still not out for all platforms) rebased JunOS on a slightly less ancient FreeBSD release, and nothing I have in my lab has it released yet, and I can't be bothered to go spelunking in the install image for what version of NTP it's running. FWIW: root@lab-ex4200:RE:1% ntpq -c rv status=06f4 leap_none, sync_ntp, 15 events, event_peer/strat_chg, version=ntpd 4.2.0-a Fri May 29 07:45:35 2015 (1), processor=powerpc, system=JUNOS15.1R1.8, leap=00, stratum=3, precision=-18, rootdelay=8.087, rootdispersion=52.195, peer=32436, refid=87.238.33.2, reftime=d94c85fa.7b317b80 Sun, Jul 12 2015 8:21:46.481, poll=10, clock=d94c8669.9b6e8a47 Sun, Jul 12 2015 8:23:37.607, state=4, offset=-1.039, frequency=-32.350, jitter=0.445, stability=0.040 It seems they've pulled the 15.1 release though, at least I can't download it anymore. Tore
Re: NTT-HE earlier today (~10am EDT)
* Mike Leber I was thinking that when I posted yesterday. These were announcements from a peer, not customer routes. We are lowering our max prefix limits on many peers as a result of this. We are also going towards more prefix filtering on peers beyond bogons and martians. Hi Mike, You're not mentioning RPKI here. Any particular reason why not? If I understand correctly, in today's leak the origin AS was changed/reset, so RPKI ought to have saved the day. (At least Grzegorz' day, considering that 33 of AS43996's prefixes are covered by ROAs.) Tore At Tue, 30 Jun 2015 10:27:21 +0200, Grzegorz Janoszka wrote: We have just received alert from bgpmon that AS58587 Fiber @ Home Limited has hijacked most of our (AS43996) prefixes and Hurricane Electric gladly accepted them.
Re: REMINDER: LEAP SECOND
* Stefan Schlesinger s...@ono.at On 25 Jun 2015, at 03:14, Damian Menscher via NANOG nanog@nanog.org wrote: http://googleblog.blogspot.com/2011/09/time-technology-and-leaping-seconds.html comes dangerously close to your modest proposal. I wonder why Google hasn't published the patch yet. Leap smear sounds like the sane way to do leap seconds, and it would't break software at all, because time adjustments in the sub-second area are proven to work quite well. It's implemented in chronyd versions 2.0 and up, for what it's worth. The required config directive is leapsecmode slew. There's a nice blog post explaining how this feature, as well as some other approaches on how to deal with the leap second, work here: http://developerblog.redhat.com/2015/06/01/five-different-ways-handle-leap-seconds-ntp/ Tore
Re: REMINDER: LEAP SECOND
* Harlan Stenn st...@ntp.org Matthew Huff writes: A backward step is a known issue and something that people are more comfortable dealing with as it can happen on any machine with a noisy clock crystal. A clock crystal has to be REALLY bad for ntpd to need to step the clock. Having 61 seconds in a minute or 86401 seconds in a day is a different story. Yeah, leap years suck too. And those jumps around daylight savings time. Hi Harlan, Leap years and DST ladjustments have never caused us any major issues. It seems these code paths are well tested and work fine. The leap second in 2012 however ... total and utter carnage. Application servers, databases, etc. falling over like dominoes. All hands on deck in the middle of the night to clean up. It took days before we stopped finding broken stuff. Maybe all the bugs from 2012 have been fixed. Maybe they haven't. Maybe new ones have been introduced. I'm not terribly optimistic. One example I'm aware of: Cisco Nexus 5010/5020 switches need software that was released as late as 29th of April this year in order to be immune to the crashburn leap second bug CSCub38654. The official «Cisco Suggested release based on software quality, stability and longevity» is older. Go figure. In any case, we're certainly not going to risk it. So our plan is to disconnect our local stratum-2s from their upstreams on June 29th so they (and more crucially, their downstream clients) remain oblivious to the leap second. Come July 1st, we'll reconnect them. The clients' clocks will be 1s (plus any drift) off at that point, but as we're running ntpd with the -x option, that shouldn't cause backwards stepping. Running with slightly incorrect clocks for a few days is a small price to pay to avoid a repeat of 2012's mayhem. Tore
Re: REMINDER: LEAP SECOND
* Majdi S. Abbas On Wed, Jun 24, 2015 at 08:33:14AM +0200, Tore Anderson wrote: Leap years and DST ladjustments have never caused us any major issues. It seems these code paths are well tested and work fine. I've seen quite a few people that for whatever reason insist on running systems in local time zones struggle with the DST reverse step. It's not nearly as much of a non-issue as you claim. Read again, and note the word us. I am describing my and my employer's experience with past DST changes and leap years, and those have indeed been completely uneventful. YMMV. The leap second in 2012 however ... total and utter carnage. Application servers, databases, etc. falling over like dominoes. All hands on deck in the middle of the night to clean up. It took days before we stopped finding broken stuff. Total and utter carnage is a bit of a stretch. As above, I am speaking only about how the 2012 leap second went down in our infrastructure. I stand by how I described the event. Again, YMMV. If you plan on let your infrastructure deal with the upcoming leap second head-on, I wish you the best of luck. Hopefully all the bugs from 2012 have been fixed. I, however, certainly have no intention of being the one to find out otherwise. Tore
Re: REMINDER: LEAP SECOND
* Matthew Huff Does anyone know what the latest that we can run our NTP servers and not distribute the LEAP_SECOND flag to the NTP clients? From http://support.ntp.org/bin/view/Support/NTPRelatedDefinitions: Leap Indicator This is a two-bit code warning of an impending leap second to be inserted in the NTP timescale. The bits are set before 23:59 on the day of insertion and reset after 00:00 on the following day. This causes the number of seconds (rollover interval) in the day of insertion to be increased or decreased by one. So the answer to your question is, AIUI, 2015-06-29 23:59:59. Tore
Re: REMINDER: LEAP SECOND
* Matthew Huff I saw that, but it says the bits are set before 23:59 on the day of insertion, but I was hoping that I could shut it down later than 23:59:59 of the previous day (8pm EST). The reason is FINRA regulations. We have to have the time synced once per trading day before the open according to the regulations. Again AIUI, and I'm no NTP expert so I hope someone corrects me if I'm wrong: If you don't configure the leapfile ntpd option, the Leap Indicator flag will flow down to your servers from the stratum-1s servers you're synchronising from (directly or indirectly). So what I think you could do, is to on the 29th remove all your upstream servers from your NTP server's config, and set fudge 127.127.1.0 stratum 3 or something like that so that clients will still want to sync to it. At that point, your NTP server's clock chip will be the reference clock, which might be drift-prone. To work around that, you could at 8pm on the 30th stop ntpd, manually sync the system clock with ntpdate, and start ntpd again. That should keep your NTP server's clock reasonably synchronised, that provides your clients with (a Leap Indicator-free) NTP service. I make no guarantees that the above will work the way I think it will, though... Try it at your own risk. Tore
Re: REMINDER: LEAP SECOND
* Matthew Huff That won't work. Being internally sync'ed isn't good enough for FINRA. All the machines must be synced to an external accurate source at least once per trading day. That was why I proposed to ntpdate on your (upstream-free since the 29th) NTP server(s) sometime on the 30th. That would synchronise its local clock with an external accurate source, without learning the Leap Indicator. Our plan is to disable our two stratum 1 servers, and our 3 stratum 2 servers before the leap second turnover, but to be 100% safe we would need to do that 24 hours before, but that would be a violation of FINRA regulations. If you run your own straum-1 servers, can't you just opt not to configure leapfile? Assuming your own organisation is the only user of those servers, that is (certainly don't do that if it's a public server). After the leap second has passed, you can proceed to correct things. Your clients will then be 1s ahead of correct time, and will need to step/slew their clocks to get in sync. But maybe that's OK as far as FINRA's concerned... It looks like the safest thing for us to do is to keep our NTP servers running and deal with any crashes/issues. That's better than having to deal with FINRA. Maybe. I have no experience with FINRA. :-) Tore
AS4788 Telecom Malaysia major route leak?
I see tons of bogus routes show up with AS4788 in the path, and at least AS3549 is acceping them. E.g. for the RIPE NCC (193.0.0.0/21): [BGP/170] 00:20:29, MED 1000, localpref 150 AS path: 3549 4788 12859 I, validation-state: valid to 64.210.69.85 via xe-1/1/0.0 Tore
Re: AS4788 Telecom Malaysia major route leak?
* Marty Strong via NANOG nanog@nanog.org It *looks* like GBLX stopped accepting the leak. If so, it's a partial fix at best, I still see plenty of leaked routes, both via 3356 and 3549, e.g.: tore@cr1-osl3 show route 195.24.168.98 all Jun 12 12:03:54 +0200 inet.0: 544405 destinations, 1591203 routes (543086 active, 3 holddown, 526626 hidden) + = Active Route, - = Last Active, * = Both 195.24.160.0/19*[BGP/170] 00:03:59, MED 2000, localpref 50, from 87.238.63.5 AS path: 3356 3549 4788 6939 39648 I, validation-state: unverified to 87.238.63.56 via ae0.0 [BGP/170] 00:05:24, MED 0, localpref 50, from 87.238.63.2 AS path: 3356 3549 4788 6939 39648 I, validation-state: unverified to 87.238.63.56 via ae0.0 [BGP ] 01:16:00, MED 25245, localpref 100 AS path: 3549 4788 6939 39648 I, validation-state: unverified to 64.210.69.85 via xe-1/1/0.0 It seems to have started around 08:47 UTC, that's when I got my first alarm from ring-sqa at least. Tore
Re: Greenfield 464XLAT (In January)
* Baldur Norddahl baldur.nordd...@gmail.com The high tech solution is stuff like MAP where you move the cost out to the CPE. But then you need to control the CPE - if you have that then great. You would still want to sell a non-NAT (and MAP is NAT) to users that require a public IPv4 address, so you still need to go dual stack or use some tunnelling for that. Hi Baldur, MAP is *not* NAT; that's what's so neat about it. The users do get a public IPv4 address (or prefix!) routed to their CPE's WAN interface, towards which they can accept inbound unsolicited connections. The public IPv4 address could be port-restricted if the operator wants address sharing, but it does not have to be. You could do both at the same time, e.g., giving your premium users a /32 or /28, while the standard subscription includes a /32 with 4k ports. I will grant you that MAP-T performs NAT (i.e., protocol translation) internally, but the translations that happens when a packet enters the MAP domain are reversed when it exits. So the IPv4 addresses are transparent end-to-end. MAP-E (and lw4o6 for that matter), on the other hand, has no form of NAT anywhere. (Unless you count the NAPT44 that sits between the subscriber's RFC1918 LAN segment and the CPE's WAN interface, but that's not exactly something that's unique to MAP.) Nicholas: If I were you, before going down the 464XLAT route, I'd first look closely at these technologies, in the order given: 1) MAP (because it is fully stateless) 2) lw4o6 (because it is mostly stateless, i.e., no session tracking) 3) DS-Lite (which, like 464XLAT, is stateful, but you'll have way more CPEs to choose from than with 464XLAT, which is mostly for mobile) Tore
Re: Android (lack of) support for DHCPv6
* Lorenzo Colitti Remember, what I'm trying to do is avoid user-visible regressions while getting rid of NAT. Today in IPv4, tethering just works, period. No ifs, no buts, no requests to the network. The user turns it on, and it works. *cough* https://code.google.com/p/android/issues/detail?id=38563 In particular comment 105 is illuminating. Android is apparently fully on-board with mobile carriers' desire to break tethering, even going so far as to implement a feature whose *sole purpose* is to break thethering. Yet, at the same time, you refuse to implement DHCPv6 on WiFi because it *might*, as a *side effect*, break tethering. This does not strike me as very consistent. If Android had instead simply refused to establish a mobile data connection to the mobile carriers that breaks tethering, then the refusal to implement DHCPv6 would make much more sense. Tore
Re: Android (lack of) support for DHCPv6
* Lorenzo Colitti Tethering is just one example that we know about today. Another example is 464xlat. You can't do 464XLAT without the network operator's help anyway (unless you/Google is planning on hosting a public NAT64 service?). If the network operator actively wants 464XLAT to be used, by providing DNS64/NAT64 service, then it seems fairly reasonable to assume that they're not going to deploy an IPv6/DHCPv6-only network that limits the number of IA_NA per attached node to 1. And that's not counting future applications that can take advantage of multiple IP addresses that we haven't thought of yet, and that we will have if we get stuck with there-are-more-IPv6-addresses-in-this-subnet-than-grains-of-sand-but-you-only-get-one-because-that's-how-we-did-it-in-IPv4 networks. Of course. Hard to argue against imaginary things. :-) On the other hand, there exist applications *today* that do require DHCPv6. One such example would be MAP, which IMHO is superior to 464XLAT both for the network operator (statlessness ftw) as well as for the end user (unsolicited inbound packets work, no NAT traversal required). MAP is provisioned with DHCPv6 (I-D.ietf-softwire-map-dhcp), so without DHCPv6 support in Android, MAP support in Android is a non-starter. Tore
Re: Android (lack of) support for DHCPv6
* Lorenzo Colitti On the other hand, there exist applications *today* that do require DHCPv6. One such example would be MAP, which IMHO is superior to 464XLAT both for the network operator (statlessness ftw) as well as for the end user (unsolicited inbound packets work, no NAT traversal required). MAP is provisioned with DHCPv6 (I-D.ietf-softwire-map-dhcp), so without DHCPv6 support in Android, MAP support in Android is a non-starter. Support for the DHCPv6 protocol, or support for assigning addresses from IA_NA? I'm not 100% certain, but you can possibly run MAP without IA_NA. But I think you'll need the CE to be configured with a predictable IPv6 address so that the BR knows where to send the IPv6-encapsulated or -translated IPv4 packets. I don't see how that would work with SLAAC. But I'm not a MAP expert, so I'm open to be educated otherwise. Anyway, here's a (hopefully constructive) suggestion on a way forward: * Implement DHCPv6 client support (IA_NA, IA_TA, IA_PD .. the works) * Upon network connection, request 2x IA_NA and 1x IA_PD (in addition to SLAAC): ** If you get addressing from SLAAC and/or IA_PD, accept the configuration and connect to the network. *** If apps/services require additional addresses, self-assign them from the on-link/delegated prefix as needed. ** If you get 2x IA_NA, accept the configuration and connect to the network. *** If apps/services requires additional addresses, request additional IA_NA as needed. If additional IA_NAs are declined either warn user or trigger Android's already existing «avoided bad network» functionality. ** If you get no SLAAC or IA_PD, and IA_NA = 1, then refuse to connect to the network (or, for a dual-stack network, connect IPv4-only). (I.e., same behaviour as on a DHCPv6-only network today.) Why N=2? Because it's 1, and what you seem to be worried about is operators using N=1 without thought (because that's what we did in IPv4). N=2 will confirm that's not the case for the given network, so I think confirming N=2 gives a much stronger indication that the network allows N=something reasonable than confirming N=1. That said, I doubt that you can rely on the network accepting N=hundreds or more, neither for DHCPv6 IA_NA *nor* SLAAC, due to neighbour table limitations and DAD overhead (both delay and packets). If the future applications we're imagining needs IPv6 addresses in that ballpark (which isn't *that* far-fetched - say a new address per connection, process, app, whatever), IA_PD is the only mechanism we have today that will work. If you start supporting IA_PD, my bet networks are going to start offering it - just like when you added 464XLAT. Tore
Re: Android (lack of) support for DHCPv6
* Dave Taht I am told that well over 50% of all android development comes from volunteer developers so rather than kvetching about this it seems plausible for an outside effort to get the needed features for tethering and using dhcpv6-pd into it. If someone wanted to do the work. https://android-review.googlesource.com/#/c/78857/ Tore
Re: Peering and Network Cost
* Mark Tinka mark.ti...@seacom.mu On 16/Apr/15 07:25, Tore Anderson wrote: We're in a similar situation here; transit prices has come down so much in recent years (while IX fees are indeed stagnant) that I am certain that if I were to cut all peering and buy everything from a regional tier-2 instead, I'd be lowering my total MRC somewhat, without really reducing connectivity quality to my (former) peers. I wouldn't say exchange point prices are stagnant, per se. They may remain the same, but what goes up is the port bandwidth. It's not directly linear, but you get my point. Again, the burden is on the peering members to extract the most out of their peering links by having as much peering as possible. You appear to be assuming that an IP transit port is more expensive then an IXP port with the same speed. That doesn't seem to always be the case anymore, at least not in all parts of the world, and I expect this trend to continue - transit prices seems to go down almost on a monthly basis, while the price lists of the two closest IXPs to where I'm sitting are dated 2011 and 2013, respectively. Even if the transit port itself remains slightly more expensive than the IXP port like in the example Baldur showed, the no-peering alternative might still be cheaper overall because even if you're peering most of your traffic you'll still need to pay a nonzero amount for a (smaller or less utilised) transit port anyway. Tore
Re: Peering and Network Cost
* Baldur Norddahl baldur.nordd...@gmail.com Transit cost is down but IX cost remains the same. Therefore IX is longer cost effective for a small ISP. As an (non US) example, here in Copenhagen, Denmark we have two internet exchanges DIX and Netnod. We also have many major transit providers, including Hurricane Electric and Cogent. Netnod price for a 1 Gbps port is 4 SEK = 4500 USD / year http://www.netnod.se/ix/join/prices. DIX is 4 DKK = 5700 USD / year http://dix.dk/serviceinformation/ HE.net is offering 1 Gbps flatrate for 450 USD / month list price = 5400 USD /year. Cogent can match that. So why would a small ISP pay 4500 USD for a service with no guarantee of how much traffic they will be able to peer away? We're in a similar situation here; transit prices has come down so much in recent years (while IX fees are indeed stagnant) that I am certain that if I were to cut all peering and buy everything from a regional tier-2 instead, I'd be lowering my total MRC somewhat, without really reducing connectivity quality to my (former) peers. For us, the primary reason that keeps us peering is DDoS prevention. Our traffic is mostly regional, so if a customer of mine gets hit with a volumetric DDoS attack that would saturate my IP transit lines and cause collateral damage, that's no big deal as we can just RTBH the customers prefix towards our transit providers. The customer is only mildly inconvenienced by this as, say, 90% of his traffic goes to our peers. Without peering the attack would succeed because my RTBH would completely offline my customer. Tore
Re: IPv6 allocation plan, security, and 6-to-4 conversion
* William Herrin T-Mobile uses something called 464XLAT. Don't let the translation part fool you: it's a tunnel. IPv4 in one side, IPv4 out the other. 464XLAT is not a tunnel. Protocol translation is substantially different from tunneling. With tunneling, the original layer-3 header is kept intact as it is encapsulated inside another layer-3 header. With translation, the original layer-3 header is removed and replaced with another layer-3 header. They come with a different set of trade-offs, such as: - Protocol translation may be lossy (e.g., exotic IPv4 options may not survive the translation to IPv6 and would therefore not reappear after translation back to IPv4). Tunneling, OTOH, is not lossy. - Tunneling moves the original layer-4 header into another encapsulation layer, so e.g. an ACL attempting to match an IPv6 HTTP packet using something like next-header tcp, dst port 80 will not work. With translation, it will. Kabel Deutschland uses something called Dual Stack Lite. It's also a tunnel: the Kabel-owned CPE encapsulates the customer's IPv4 packets within IPv6 and delivers them to the Kabel's IPv4 carrier NAT box. Yep. DS-Lite is indeed tunneling. So sure, if you don't mind dissembling a little bit you can say that they moved their infrastructure to IPv6-only. In my mind, tunnelling IPv4 over IPv6 where it both enters and exits the carrier's area of control as an IPv4 packet doesn't count as IPv6-only. I guess we disagree about the definitions, then. In my view, a dual-stack network is one where IPv4 and IPv6 are running side-by-side like ships in the night with no fate sharing. You might be running two different IGP protocols (like OSPFv2 and OSPFv3) and a duplicated set of iBGP sessions. ACLs and the like must exist both for IPv4 and IPv6. And so on. If you turn off one protocol, and the other one keeps on running just like before. This is in contrast with a single-stack network; turn off that single stack, and nothing works. That doesn't mean that cannot simultaneously transport other layer-3 protocols across that single-stack network; just that there is a clear distinction between which is the main layer-3 protocol and others being transported across it. You might very well simultaneously transport IPv6, AppleTalk, and IPX/SPX across an IPv4-only network - but that doesn't mean that the network is quad-stack - IMHO, it's still single-stack IPv4. On Fri, Jan 30, 2015 at 11:44 AM, Tore Anderson t...@fud.no wrote: If everyone could just dual-stack their networks, they might as well single-stack them on IPv4 instead; there would be no point whatsoever in transitioning to IPv6 for anyone. What do you mean if? Carrier NAT means we *can* single-stack on IPv4 for the next 20 to 30 years, if we're so inclined. I suppose that's true - if you ignore that a number of other folks are deploying IPv6 to deal with their IPv4 exhaustion, and that products and services are being put to market that recommends the use of IPv6 connectivity above NATed IPv4 (e.g., Xbox One). So much earlier than 30 years from now you'll be wanting to have IPv6 in your network anyway, and once you come to that realisation you might also realise that operating a dual-stack network for those 30 years is not going to be any fun at all due to the increased complexity it causes. Especially if the IPv4 part of that dual-stack network is in itself getting increasingly complex due to more and more NAT being added to deal with growth. So IMHO dual-stack is a bad recommendation, or at least it is rather shortsighted. If you're in a position to do single-stack IPv6-only with IPv4 as a service (like T-Mobile USA or Kabel Deutschland), you'll end up with a much simpler network that it'll be much easier to maintain over the years. This also facilitates the use of IPv4 address sharing solutions like lw4o6 and MAP, whose stateless nature makes them vastly superior to traditional stateful Carrier Grade NAT44 boxes. YMMV, of course. Tore
Re: IPv6 allocation plan, security, and 6-to-4 conversion
Hi Baldur, * Baldur Norddahl baldur.nordd...@gmail.com On 1 February 2015 at 20:10, Tore Anderson t...@fud.no wrote: - Tunneling moves the original layer-4 header into another encapsulation layer, so e.g. an ACL attempting to match an IPv6 HTTP packet using something like next-header tcp, dst port 80 will not work. With translation, it will. But on the other hand you will mess up with the routing of the network. In our network both IPv4 and IPv6 are routed to different transit points depending on the destination. With translation you need to ensure that the traffic passes a translation point before it leaves the network. Sure, but you could scatter these translation points all over your network, so that the flow of traffic remains optimal. You could enable the translation functionality on your aggregation and/or your border routers, for example. The traffic would need to pass those anyway, so there's no real change to how traffic is being routed. If that translation involves NAT, then you also need to ensure that the return traffic hits the same translation device. No, with stateless solutions like MAP and lw6o4, there is no such requirement. Anycast them or use ECMP towards them however way you like. This is in my view one of the great advantages of such solutions over IPv4 CGN. To the best of my knowledge, there exists no stateless IPv4 sharing mechanism. So the CGN-ed traffic must flow bidirectionally across the same translation device, which then could easily become a choke point. Also, should the CGN device fail, all the existing sessions it was handling would be disrupted. In my view, a dual-stack network is one where IPv4 and IPv6 are running side-by-side like ships in the night with no fate sharing. You might be running two different IGP protocols (like OSPFv2 and OSPFv3) and a duplicated set of iBGP sessions. ACLs and the like must exist both for IPv4 and IPv6. And so on. If you turn off one protocol, and the other one keeps on running just like before. By that definition my dual stack network is single stack: kill ipv4 and MPLS goes down = everything is down. On the other hand there are actually two IPv4 networks, since the IPv4 network under MPLS does not carry internet traffic directly. BOTH IPv4 and IPv6 can be said to be tunneled through the MPLS network. While MPLS certainly blurs the lines a bit, based on your description I think that your network could reasonably be described as single-stack MPLS/IPv4-only at its core, while IPv6 (using 6PE I guess?) and another instance of IPv4 (distinct from the one used for MPLS signaling) is being transported as a service across that single-stack network. I do not see the point in making this mess even bigger by adding another layer by shoehorning v4 traffic into v6 packets. Agreed, considering that you seem to already be enjoying the benefits of having a single-stack network. That is after all what I am saying folks should be considering, rather than automatically going down the dual-stack road. While you're using MPLS instead of IPv6, the principle is similar. I fail to see the complexity. You are advocating that I should have spent money on more equipment and force my users to use a ISP supplied CPE (currently my users can use any CPE they want). I'm just advocating that people should seriously *consider* it, especially if they're buidling something new. I'm not saying it's for everyone everywhere, nor for you specifically. For a provider that controls the user equipment, going IPv6-only certainly a possibility, as demonstrated by T-Mobile USA and Kabel Deutschland. If OTOH there is a requirment to support legacy IPv4-only CPEs, then clearly IPv6-only isn't going to work out too well. Tore
Re: IPv6 allocation plan, security, and 6-to-4 conversion
* William Herrin nat64/nat46 - allows an IPv6-only host to interact in limited ways with IPv4-only hosts. Don't go down this rabbit hole. This will probably be useful in the waning days of IPv4 when folks are dismantling their IPv4 networks but for now the corner cases will drive you nuts. Plan on dual-stacking any network which requires access to IPv4 resources such as the public Internet. For many folks, that's easier said than done. Think about it: If everyone could just dual-stack their networks, they might as well single-stack them on IPv4 instead; there would be no point whatsoever in transitioning to IPv6 for anyone. Tore
Re: IPv6 allocation plan, security, and 6-to-4 conversion
* Mel Beckman Um, haven't you heard that we are out of IPv4 addresses? The point of IPv6 is to expand address space so that the Internet can keep growing. Maybe you don't want to grow with it, but most people do. Eventually IPv4 will be dropped and the Internet will be IPv6-only. Dual-stack is just a convenient transition mechanism. Mel, Dual-stack was positioned to be a convenient transition mechanism 15 years ago (to take the year when RFC 2893 was published). However, that train left the platform mostly empty years ago, when the first RIRs started to run out of IPv4 addresses. After all, we were supposed to have dual-stack everywhere *before* we ran out of IPv4. That didn't happen. The key point is: In order to run dual-stack, you need as many IPv4 addresses as you do to run IPv4-only. Or to put it another way: If you don't have enough IPv4 addresses to run IPv4-only, then you don't have enough IPv4 addresses to run dual-stack either. Sure, you can squeeze some more life-time out of IPv4 by adding more NAT (something which is completely orthogonal to deploying IPv6 simultaneously). However, if you're already out of IPv4, and you already see no way forward except adding NAT, then you should seriously consider doing the NAT (or whatever backwards compat mechanism you prefer) between the residual IPv4 internet and your IPv6 infrastructure, instead of doing it between IPv4 and IPv4. Running single-stack is simply much easier and less complex than dual-stack, and once your infrastructure is based on an IPv6-only foundation, you don't have to bother with any IPv4-IPv6 transition project ever again. Tore
Re: IPv6 allocation plan, security, and 6-to-4 conversion
* Baldur Norddahl Single stacking on IPv6 is nice in theory. In practice it just doesn't work yet. If you as an ISP tried to force all your customers to be IPv6 single stack, you would go bust. Kabel Deutschland, T-Mobile USA, and Facebook are examples of companies who have already or are in the process of moving their network infrastructure to IPv6-only. Without going bust. What you *do* need, is some form of connectivity to the IPv4 internet. But there are smarter ways to do that than dual stack. Seriously, if you're building a network today, consider making IPv4 a legacy app or service running on top of an otherwise IPv6-only infrastructure. Five years down the road you'll thank me for the tip. :-) Tore
Re: DDOS solution recommendation
* Roland Dobbins rdobb...@arbor.net On 12 Jan 2015, at 16:19, Tore Anderson wrote: I'd love to use flowspec over D/RTBH, but to me it seems like vapourware. I meant on your own infrastructure, apologies for the confusion. Right. So if I first need to accept the traffic onto my infrastructure before I can discard it, I'm dead in the water anyway: My uplinks will sit there at 100% ingress utilisation, dropping legitimate traffic. /32 or /128 D/RTBH announcements towards my transits is my only real option at this point. That helps protect against collateral damage, and if the customer's audience is local, it can also restore full operation for the attacked customer's primary markets (which are usually reached via peers instead of transits). For attacks that are conveniently sized smaller than my upstream capacity, I could see that flowspec could be useful, but not in a unique way, as inside my own network I can easily distribute targeted stateless discard ACLs in many other ways too (I use Netconf currently). Transit providers utilizing Juniper aggregation edge routers could do it now - why they don't, I don't know. I'd definitively be willing to pay a premium for such a feature. Tore
Re: DDOS solution recommendation
* Roland Dobbins rdobb...@arbor.net On 11 Jan 2015, at 20:52, Ca By wrote: 3. Have RTBH ready for some special case. S/RTBH and/or flowspec are better (S/RTBH does D/RTBH, too). But are there any transit providers that support flowspec these days? As I understand it, only GTT used to, but they stopped. I'd love to use flowspec over D/RTBH, but to me it seems like vapourware. Tore
Re: Charging fee for BGP prefix per /24?!
* Yucong Sun My recent inquiry to some network provider reveals that they are charging fee for per /24 announced. Obvious that would means they get to charge a lot with little to none efforts on their side. In a world we are charging total bytes transferred instead of bps on uplinks, i can't say I'm surprised that much. But does anyone else had same experience? Did you pay? Is this the new status quo now? Haven't encountered this myself, but putting a price on DFZ routing slots seems like a Good Thing to me. Tore
Re: IPv6 Default Allocation - What size allocation are you giving out
* Baldur Norddahl Why do people assign addresses to point-to-point links at all? You can just use a host /128 route to the loopback address of the peer. Saves you the hassle of coming up with new addresses for every link. Why do you need those host routes? Most IPv6 IGPs work just fine without global addresses or host routes. https://tools.ietf.org/html/draft-ietf-opsec-lla-only-11 Tore
Re: What Net Neutrality should and should not cover
* William Herrin On Sun, Apr 27, 2014 at 2:05 AM, Rick Astley jna...@gmail.com wrote: #3 On paid peering: I think this is where people start to disagree but I don't see what should be criminal about paid peering agreements. More specifically, I see serious problems once you outlaw paid peering and then look at the potential repercussions that would have. Double-billing Rick. It's just that simple. Paid peering means you're deliberately billing two customers for the same byte -- the peer and the downstream. And not merely incidental to ordinary service - the peer specifically connects to gain access to customers who already pay you and no one else. Where those two customers have divergent interests, you have to pick which one you'll serve even as you continue to bill both. That's a corrupt practice. It's not just that simple. If for example you asks for a peering with me, the first thing I'll do is to take a close look at how the traffic between our two networks is currently being routed. If I see that I have no monetary or technical gain from setting up that peering with you, perhaps because the traffic is currently flowing via an already existing peering of mine (with your upstream, say), or via a transit port of mine that's not exceeding its CDR, then I'd probably want you to at cover my costs of setting up that peering before accepting, at the very least. Even if I was exceeding the CDR on my transit ports, it's not at all certain that accepting a peering with you would even be a break-even proposition for me. Keep in mind that unlike routers and line cards, IP transit service *is* dirt cheap these days. So no, refusing a peering or requiring the would-be peer to pay for the privilege isn't *necessarily* corrupt practice. It Depends. Tore
Re: misunderstanding scale
* William Herrin On Sat, Mar 22, 2014 at 8:19 PM, Randy Bush ra...@psg.com wrote: don't believe for a moment that v6 to v4 protocol translation is any less ugly than CGN. it can be stateless You're smarter than that. https://tools.ietf.org/html/rfc6145 https://tools.ietf.org/html/draft-ietf-softwire-map-t-05 https://tools.ietf.org/html/draft-anderson-siit-dc-00 Tore
Re: misunderstanding scale (was: Ipv4 end, its fake.)
* John Levine Also, although it is fashionable to say how awful CGN is, the users don't seem to mind it at all. You might just be looking in the wrong places. Try searching for playstation nat type 3 or xbox strict nat. Tore
Re: misunderstanding scale
* Nick Hilliard the level of pain associated with continued deployment of ipv4-only services is still nowhere near the point that ipv6 can be considered a viable alternative. This depends on who you're asking; as a blanket statement it's demonstrably false: For the likes of T-Mobile USA¹ and Facebook², or even myself³, IPv6-only isn't just an «alternative». It's «happening». [1] http://www.dslreports.com/shownews/TMobile-Goes-IPv6-Only-on-Android-44-Devices-126506 [2] https://www.dropbox.com/s/doazzo5ygu3idna/WorldIPv6Congress-IPv6_LH%20v2.pdf [3] http://www.ipspace.net/IPv6-Only_Data_Centers Tore
Re: BGP multihoming
* Tore Anderson * Baldur Norddahl Is assigning a /24 from my own PA space for the purpose of BGP multihoming considered sufficient need? Not with current policies, no That was then. With current policies: yes. To elaborate a bit, the RIPE Community just reached consensus on a policy change that makes the size and the purpose of an assignment entirely a local decision. That means that if you and your customer agree that a /X is needed for purpose Y, and you as the LIR have the available space and the willingness to make that assignment, you are now free to make it. The new IPV4 policy does not mandate any limits to what X and Y might be, except for the fact that Y must somehow involve «operating a network» (your use case certainly qualifies). Tore
Re: Updated ARIN allocation information
* Owen DeLong In answer to Tore's statement, this block does not apply the standard justification criteria and I think you would actually be quite hard pressed to justify a /24 from this prefix. In most cases, it is expected that these would be the IPv4 address pool for the public facing IPv4 side of a NAT64 or 464xlat service. Most organizations probably only need one or two addresses and so would receive a /28. It is expected that each of these addresses likely supports several thousand customers in a service provider environment. This latter expectation of over-subscription is not echoed by the policy text itself. One of the valid usage examples mentioned («key dual stack DNS servers») would also be fundamentally incompatible with an requirement of over-subscription. If you look at the common transitional technologies you'll see that not even all of them do support over-subscription. In alphabetical order: - 6RD: No over-subscription possible, would require at least one IPv4 address per subscriber plus additional addressing required for the transport/access network. - 6PE/6VPE: No over-subscription possible, the infrastructure must be numbered normally with IPv4. - DS-Lite (AFTR): Over-subscription possible, but it's entirely reasonable to want to make the ratio as low as possible, in order to provide as many source ports as possible to the subscriber, to ease abuse handling, and so on. - MAP: Similar to DS-Lite, but is less flexible with regards to over-subscription, as all users in a MAP domain must get the same amount of ports. Thus the maximum over-subscription you can achieve is limited by your most active subscriber in his peak period of use, i.e., if you have a subscriber whose usage peaks to 20k ports, then that MAP domain can only support a 2:1 over-subscription ratio. MAP can also be configured in a not over-subscribed 1:1 mode. - NAT64: Same as DS-Lite. - SIIT: No over-subscription possible, as it's by design a 1:1 mapping. That said, the policy language does say «ARIN staff will use their discretion when evaluating justifications». So I suppose it is theoretically possible that the ARIN staff will do their best Dr. Evil impression, coming up with a big number N, and require requestors to have a N:1 over-subscription ratio to qualify. However, that would be better described as indiscretion, not discretion, IMHO. After all, the RIRs are book-keepers, not network operators; if a network operator makes a reasonable request, it isn't the RIR's place to second-guess their network deployment. If ARIN is doing that, they're overstepping. So in summary it seems to me that it is pretty easy to make a reasonable request for a /24 under this particular policy, and especially considering the immense routing benefit the /24 will have over all the other possible prefix lengths that can be requested (persuading providers/peers to accept /28s might be done on a small scale, but just won't work if you need global connectivity, and global connectivity is what end users expect), the only realistic outcome I can see is that [almost] all the requestors will go ahead and ask for the /24. We'll just have to wait and see, I guess. Tore
Re: Updated ARIN allocation information
* Mark Andrews I understand this but this block changes the status quo. It is a policy changer. AFAIK ARIN hasn't done allocations to the /28 level like this in the past. This is all new territory. It's not exactly new. Like I've mentioned earlier in this thread, the RIPE NCC has granted assignments smaller than /24 to requestors since, well, forever. There are currently 238 such assignments listed in delegated-ripencc-extended-latest.txt. However, these microscopic assignments have proven hugely unpopular, amounting for only a fraction of a percent of the total (there are 27733 assignments equal to or larger than /24 in the same file). What I fail to understand from this thread is the apparent expectation that these smaller-than-/24 microscopic delegations from ARIN will be popular. As I read the policy in question, the requestors may get a /24 instead. That's a pretty miniature block to begin with and trivial to justify, and given human nature of wanting to grab as much of something as you can (especially when you in all likelihood cannot get nearly as much as you actually need), coupled with the fact that a /24 is likely to be immensely more useful than anything smaller...well, I just don't see why we shouldn't realistically expect that pretty much all of the assignments being made from this block will be exactly /24, and that exceptions that prove the rule will amount for 1% of the total - just like we've seen happen in the RIPE region. Oh well. Time will tell, I suppose. Tore
Re: FW: Updated ARIN allocation information
* Justin M. Streiner In the worst case, this would add another 262,144 routes (/10 fully assigned, and all assignments are /28s) to the global IPv4 route view. Realistically, the number will be a good bit smaller than that, but only time will tell for sure exactly how much smaller. Wash/rinse/repeat for any other RIR that adopts a similar policy. I wouldn't worry if I were you. I'll wager you $100 that pretty much all of the people requesting a block from ARIN under this policy (or any other) is going to go for a /24 (or larger). There is some precedent; RIPE policy has not mandated a minimum assignment size for IPv4 PI, at least not in the last decade, yet the NCC has made almost no assignments smaller than /24. Tore
Re: Are specific route objects in RIR databases needed?
* Job Snijders On Thu, Jan 30, 2014 at 06:51:59PM +0200, Martin T wrote: for example there is a small company with /22 IPv4 allocation from RIPE in European region. This company is dual-homed and would like to announce 4x /24 prefixes to both ISPs. Both ISP's update their prefix-lists automatically based on records in RIPE database. For example Level3 uses this practice at least in Europe. If this small company creates a route object for it's /22 allocation, then is it enough? Theoretically this would cover all four /24 networks. Or in which situation it is useful/needed to have route object for each /24 prefix? You should create a route object for each route that you announce, if you announce 4 x /24 you should create a route: object for each /24. +1 ps. Can you please send 20 dollarcent per /24 to my paypal account (j...@instituut.net) with the reference deaggregation fee? Indeed. Martin, I'd suggest announcing the 4 x /24s to each ISP tagged with the no-export community in order to achieve whatever you are trying to do, *in addition* to the covering /22. That way you're not polluting Job, my, and everyone else's routing tables more than necessary, only your own ISPs', but then again you're actually paying them for the privilege. Tore
Re: BGP multihoming
* Baldur Norddahl Apologies for a RIPE question on NANOG, although I believe this issue will soon enough to be relevant for the ARIN region as well. Relevant perhaps, but as the policies differ, so may the correct answers... I had a customer ask if we could provide him with BGP such that he could be multihomed. He already has 128 IP addresses from another ISP. Obviously a /25 is a non go for multihoming as everyone are going to ignore his route. I would then need to help him with acquiring a /24 PI. Which appears to be impossible as RIPE does no longer assign PI space and PI can not be reassigned and thus be bought. There is another option, namely if your customer becomes a RIPE NCC member (i.e., an LIR), he'll get a PA /22. (Of course, you could offer to perform all the administrative work is to start and operate an LIR on your customer's behalf, for a reasonable fee.) Is assigning a /24 from my own PA space for the purpose of BGP multihoming considered sufficient need? Not with current policies, no, as the multihoming clause only applied specifically to PI assignments, not pA. However, if you customer can show that he'll be using at least 128 addresses (i.e., 50% of a /24) within a year, he does qualify for an assignment of a /24. Plans to renumber out of his current /25 would count towards that. Tore
Re: Will a single /27 get fully routed these days?
* Sander Steffann But more important: which /10 is set aside for this? It is not listed on https://www.arin.net/knowledge/ip_blocks.html Probably 23.128/10: arin||ipv4|23.128.0.0|4194304||reserved| Tore
Re: ddos attacks
* James Braunegg Of course for any form of Anti DDoS hardware to be functional you need to make sure your network can route and pass the traffic so you can absorb the bad traffic to give you a chance cleaning the traffic. So in order for an Anti-DDoS appliance to be functional the network needs to be able to withstand the DDoS on its own. How terribly useful. Tore
Re: ddos attacks
* Dobbins, Roland Once again, nothing in my post said or referred to bandwidth; The post of mine, to which you replied, did. Perhaps if you had taken your own advice quoted below when replying to me, Nick wouldn't have been contextually confused. Tore In future, it might be a good idea to ensure that the points one attempts to make actually apply to the specific post to which one is replying.
Re: IP Fragmentation - Not reliable over the Internet?
* Owen DeLong On Aug 27, 2013, at 07:33 , valdis.kletni...@vt.edu wrote: Saku Ytti and Emile Aben have numbers that say otherwise. And there must be a significantly bigger percentage of failures than pretty close to 0, or Path MTU Discovery wouldn't have a reputation of being next to useless. No, their numbers describe what happens to single packets of differing sizes. Nothing they did describes results of actually fragmented packets. Yes, it did. Hint: 1473 + 8 + 20 Tore
Re: Evaluating Tier 1 Internet providers
* Richard Hesse On Tue, Aug 27, 2013 at 12:14 PM, Joe Abley jab...@hopcount.ca wrote: - response you can expect when you call one day and say our 10GE is maxed out with inbound traffic from apparently everywhere, it has been going on for an hour, please help That was good for a laugh. If it's a DoS, you know what the answer already is. We no longer offer filtering for any of our customers. You must upgrade to the DDoS prevention service. We've actually made a list of other companies that share our providers' downstream links in each facility and reached out to them. We get them to call up and complain to said tier1 provider that something is affecting our traffic. That usually gets filters installedotherwise no dice. Several providers have a self-service blackholing functionality which may alleviate DDoS attacks. Typically you announce the attacked /32 or a /128 to your upstreams, tagged with some special blackhole community, and/or to a special multihop BGP session dedicated for blackholing purposes. Doing so will cause your upstreams to automatically drop the attack traffic within their network, *before* it gets to saturate your uplinks. Clearly, this is a blunt and last-resort type of tool which will cement the efficiency of the attack from a global perspective, but that may be an acceptable trade-off depending on the circumstances; you may prevent collateral damage from impacting your other customers, and by cutting out global attack traffic might enable the attacked customer to serve his primary markets just fine through local peering sessions, regional transits, and so forth. I'm not buying transit from a network that don't give me such blackholing functionality, FWIW. Tore
Re: IANA Reference to hopopt as a protocol
* David Edelman Does anyone have an explanation for the IPv6 hopopt appearing as protocol value 0 in http://www.iana.org/assignments/protocol-numbers? It's defined in RFC 2460, section 4.3. Which is linked to from the reference column of the page you linked to... Tore
Re: IP4 address conservation method
* Blake Hudson One thing not mentioned so far in this discussion is using PPPoE or some other tunnel/VPN technology for efficient IP utilization. The result could be zero wasted IP addresses without the need to resort to non-routable IP addresses in a customer's path (as the pdf suggested) and without some of the quirkyness or vendor lock-in of using ip unnumbered. PPPoE (and other VPNs) have many of the same downsides as mentioned above though, they require routing cost and increase the complexity of the network. The question becomes which deployment has more cost: the simple, yet wasteful, design or the efficient, but complex, design. shameless plug alert Or, simply just use IPv6, and use a stateless translation service located in the core network to provide IPv4 connectivity to the public Internet services. This allows for 100% efficient utilisation of whatever IPv4 addresses you have left - nothing needs to go to waste due to router interfaces, subnet power of 2 overhead, internal servers/services that have no Internet-available services, etc...all without requiring you to do anything special on the server/application stacks to support it (like set up tunnel endpoints), add dual-stack complexity into your network, or introduce any form of stateful translation or VPN service into your network. Here's some more resources: http://fud.no/talks/20130321-V6_World_Congress-The_Case_for_IPv6_Only_Data_Centres.pdf http://tools.ietf.org/html/draft-anderson-siit-dc-00 In case you're interested in more, Ivan Pepelnjak and I will host a (free) webinar about the approach next week. Feel free to join! http://www.ipspace.net/IPv6-Only_Data_Centers BTW: I hear Cisco has implemented support for this approach in their latest AS1K code, although I haven't confirmed this myself yet. Tore
Re: It's the end of the world as we know it -- REM
* Owen DeLong Quite the contrary… I personally think that the abysmal rate of IPv6 adoption among some content providers (Are you listening, Amazon, Xbox, BING?) is just plain shameful. FWIW, www.bing.com resolves to IPv6 addresses from where I'm sitting (Oslo), and the page seems to load over IPv6 as well. Also, Amazon provides some form of IPv6 (I believe it's based on 6RD or something similar though). At least, the NLNOG RING has six Amazon-hosted nodes, all with IPv6 enabled (amazon0{1..6}.ring.nlnog.net). All of them respond to ICMPv6 pings from here. Whether or not the average Amazon customer chooses to enable IPv6 or not is another story, though.. Tore
Re: It's the end of the world as we know it -- REM
* Andrew Latham I have sadly witnessed a growing number of businesses with /24s moving to colocation/aws networks and not giving up their unused network space. I assume this will come into play soon. A couple of /24s being returned wouldn't make a significant difference when it comes to IPv4 depletion. Heck, not even a couple of /8s would. Trying to reclaim and redistribute unused space would be a tremendous waste of effort. I have already read the news of blackmarket sales of network allocations in Europe. Interesting. Do you have a link or some other kind of reference? Tore
Re: It's the end of the world as we know it -- REM
* Mikael Abrahamsson On Wed, 24 Apr 2013, Tore Anderson wrote: I have already read the news of blackmarket sales of network allocations in Europe. Interesting. Do you have a link or some other kind of reference? http://www.ripe.net/lir-services/resource-management/listing is a white market sales place. Perhaps that's what the previous poster meant. Searching for IPv4 broker yields a lot of results as well, that might be the black market though. White market transfers has been allowed in the RIPE region since late 2008, cf. http://www.ripe.net/ripe/policies/proposals/2007-08. There's no requirement that the transferred space is put on the NCC's listing service first - you can use a broker to arrange it if you want, or do it completely in private. For a transfer not to be white, the transaction would need happen without the NCC's knowing and blessing. This implies validations of the receiver's operational need for the allocation, and updating the registry/database to reflect the new holder. I'm genuinely interested in reading articles or other research documenting that such black market transfers are happening (or not). Tore
Re: It's the end of the world as we know it -- REM
* Chris Grundemann Nope, you are correct Geoff. There is a /10 reserved for transition technologies (e.g. outside addresses on a CGN) and there is a critical infrastructure reserve, but no general purpose reserve like in RIPE and APNIC. One interesting thing is that this is dedicated specifically for transition/deployment of *IPv6*. So the way I understand it, you won't get any space from this block to number the outside of a NAT444-style CGN, while you would for a NAT64-style CGN. https://www.arin.net/policy/nrpm.html#four10 Tore
Re: It's the end of the world as we know it -- REM
* Andrew Latham If I can walk around a smallish town and point at 5 businesses like this its a possible solution. I am not claiming a few /24s will do, I am claiming that there are many (for larger values of many) companies like this. There are certainly several thousands or even millions of unused IPv4 addresses in existence. But reclaiming and redistributing it, which would be a colossal undertaking, would only push back IPv4 depletion by a few months. It's simply not worth the effort. I have already read the news of blackmarket sales of network allocations in Europe. Interesting. Do you have a link or some other kind of reference? I did a quick search and they are easy to find. Many news articles about Microsoft buying network allocations at auction to set a price of ~$11USD per IP. One tangent article that I liked was Sure, there's a market all right. However, the well publicised Microsoft/Nortel transfer wasn't a black market transfer, it was done in accordance with the ARIN community's policies. Straight from the horse's mouth: https://www.arin.net/about_us/media/releases/20110415.html Such transfers are also permitted by the community's policies in the RIPE region, and the NCC maintains a public list of all such legit/white transfers that have taken place: https://www.ripe.net/lir-services/resource-management/ipv4-transfers/table-of-transfers http://www.datacenterknowledge.com/archives/2012/07/16/ipv4-addresses-now-driving-hosting-deals/ This article mentions a black market, but it falls short of providing any tangible evidence that it really exists, or to what extent - it appears to me to be more speculation and conjecture than anything else. That said - such speculation may well turn out to be correct, of course, and being involved in the RIPE community I'm genuinely interested in the topic. Therefore I was hoping you'd point me in the direction of the news of blackmarket sales of network allocations in Europe you mentioned you have read. Tore
Re: RPKI Support on the Juniper SRX line
* Carlos M. martinez the partner insists that Junos 12.3 / 13.1 supports RPKI on the SRX line. JUNOS 12.3 and 13.1 aren't supported on SRX at all. From e.g. http://www.juniper.net/support/downloads/?p=srx5600 : «High: Junos OS Release 12.2, 12.3 and 13.1 are not supported On SRX Series, J Series, LN1000 and WXC-ISM-200( PSN-2012-09-707).» Tore
Re: Verizon DSL moving to CGN
* Owen DeLong The need for CGN is not divorced from the failure to deploy IPv6, it is caused by it. In a historical context, this is true enough. If we had accomplished ubiquitous IPv6 deployment ten years ago, there would be no IPv4 depletion, and there would be no CGN. However, that ship has sailed long ago. You're using present tense where you should have used past. I was responding to Mikael's claim that pushing content providers to deploy IPv6 was orthogonal to the need for CGN. If we put down the history books and focus on today's operational realities, it *is* orthogonal. If you're an ISP fresh out of IPv4 addresses today, pushing content providers to deploy IPv6 is simply not a realistic strategy to deal with it. CGN is. Clearly your statement here indicates that you see my point that it is NOT orthogonal, but, in fact the failure of content providers to deploy IPv6 _IS_ the driving cause for CGN. I'm not sure why you are singling out content providers, BTW. There are no shortage of other things out there that have an absolute hard requirement on IPv4 to function properly. Gaming consoles, Android phones and tables, iOS phones and tablets[1], home gateways, software and apps, embedded devices, ... - the list goes on and on. If the only missing piece of the puzzle was the lack of IPv6 support at the content providers' side, IPv6+NAT64 would constitute a perfectly viable residential/cellular internet service. As far as I know, however, not a single provider is seriously considering this strategy going forward. That's telling. Tore [1] From what I hear, anyway. They used to work fine on IPv6-only wireless networks, I've seen it myself, but I've been told that it's taken a turn for the worse over the course of the last year.
Re: Verizon DSL moving to CGN
* Mikael Abrahamsson On Mon, 8 Apr 2013, Rajiv Asati (rajiva) wrote: MAP is all about stateless (NAT64 of Encapsulation) and IPv6 enabled access. MAP makes much more sense in any SP network having its internet customers do IPv4 address sharing and embrace IPv6. It's still NAT. AIUI, the standards-track flavour of MAP, MAP-E, is *not* NAT - it is tunneling, pure encap/decap plus a clever way to calculate the outer IPv6 src/dst addresses from the inner IPv4 addresses and ports. The inner IPv4 packets are not modified by the centralised MAP tunneling routers, so there is no Network Address Translation being performed. The tunnel endpoint will 99.99% of cases be a CPE with a NAPT44 component though, so there is some NAT involved in the overall solution, but it's pretty much the same as what we have in today's CPEs/HGWs. The only significant difference is that a MAP CPE must be prepared to not being able to use all the 65536 source ports. Tore