Re: Emulating a cellular interface
On Sat, 6 Nov 2010, Saqib Ilyas wrote: A friend of mine is doing some testing where he wishes to emulate a cellular-like interfaces with random drops and all, out of an ethernet interface. Since we have plenty of network and system ops on the list, I I would say that a cellular interface doesn't really have random drops, then there is something wrong with it (at least if it's UMTS). A UMTS network has RLC re-transmits all the way between the RNC and the mobile terminal. This means that a UMTS network has more characteristics of packet flow, stalls for hundreds of milliseconds or seconds (or even minutes, my record is 240 seconds), and then a burst of packets. Very jittery, but very low rate of packet loss. Also, the terminal might go down in idle, meaning that if you send a packet to it, it'll take 1-2 seconds for it to come out of idle, cellular resources allocated, and then the packets start flowing. I don't have a good idea how to emulate this, the tools I've seen so far usually just emulate jitter, delay and packet loss, and not really the above behaviours. -- Mikael Abrahamssonemail: swm...@swm.pp.se
Re: Emulating a cellular interface
On 11/6/2010 1:53 AM, Saqib Ilyas wrote: Greetings NANOGers A friend of mine is doing some testing where he wishes to emulate a cellular-like interfaces with random drops and all, out of an ethernet interface. Since we have plenty of network and system ops on the list, I thought we might have luck posting the question here. Is anyone aware of a simple tool to do this, other than rather involved configuration of iptables? Thanks and best regards Take an old Cisco hub, and a hammer. Hit one with the other until you get the desired result! Cheers! Andrew
Re: Emulating a cellular interface
lol Andrew. Miakel, yea, it's something like that we're trying to emulate. On Sat, Nov 6, 2010 at 11:13 AM, Andrew Kirch trel...@trelane.net wrote: On 11/6/2010 1:53 AM, Saqib Ilyas wrote: Greetings NANOGers A friend of mine is doing some testing where he wishes to emulate a cellular-like interfaces with random drops and all, out of an ethernet interface. Since we have plenty of network and system ops on the list, I thought we might have luck posting the question here. Is anyone aware of a simple tool to do this, other than rather involved configuration of iptables? Thanks and best regards Take an old Cisco hub, and a hammer. Hit one with the other until you get the desired result! Cheers! Andrew -- Muhammad Saqib Ilyas PhD Student, Computer Science and Engineering Lahore University of Management Sciences
Re: RINA - scott whaps at the nanog hornets nest :-)
Subject: RINA - scott whaps at the nanog hornets nest :-) Date: Fri, Nov 05, 2010 at 03:32:30PM -0700 Quoting Scott Weeks (sur...@mauigateway.com): It's really quiet in here. So, for some Friday fun let me whap at the hornets nest and see what happens... ;-) http://www.ionary.com/PSOC-MovingBeyondTCP.pdf This tired bumblebee concludes that another instance of Two bypassed computer scientists who are angry that ISO OSI didn't catch on gripe about this, and call IP esp. IPv6, names in effort to taint it. isn't enough to warrant anything but a yawn. More troubling might be http://www.iec62379.org/ and what they (I think they are ATM advocates of the most bellheaded form) are trying to push into ISO standard. Including gems like Research during the decade leading up to 2010 shows that the connectionless packet switching paradigm that is inherent in Internet Protocol is unsuitable for an increasing proportion of the traffic on the Internet. Sic! Now that is something to bite into. -- Måns Nilsson primary/secondary/besserwisser/machina MN-1334-RIPE +46 705 989668 Do I have a lifestyle yet? pgp9cmL3CjmdV.pgp Description: PGP signature
Re: NAP of the Capital Region
On Nov 5, 2010, at 3:19 PM, Santino Codispoti wrote: Does anyone have an up to date list of the carriers that are within the NAP of the Capital Region? Abovenet Att Level3 Verizon TATA Cogent (I am being told is joining or has just joined).. Terremark's transit network I actually have a decent size deployment there. Culpeper is a lovely city you will enjoy there. let me know if you need more info, i am considered culpeper / nap cr local according to many people.. ;) mehmet
Re: BGP support on ASA5585-X
On Fri, 2010-11-05 at 21:50 -0500, Tony Varriale wrote: somebody said: They could make it out of the box but this is why Dylan made his statement. His statement is far fetched at best. Unless of course he's speaking of 100 million line ACLs. Can I just ask out of technical curiosity: Q: What is considered a large number of ACL lines for these recent ASA boxes? I realise it depends so I'm looking for a loose ball-park response. (or preferably a rule-of-thumb equation?) background to the question: I have several special purpose BSD boxes that have several hundred lines of PF filtering rules (the equivalent of a Cisco ACL line). One has nearly 2300. These are consolidated with macros (PF anchors/tables) and dynamic rulesets, so are already highly optimised. The rules are in addition to the shaping and anti-spoofing, these are in a critical location in the (very sensitive) very complex network. I'm just wondering if this is a lot in the world of recent ASAs, having had no relevant experience with them (at this level) Gord -- soul for sale - apply within
Re: IPv6 fc00::/7 - Unique local addresses
On 11/1/10 9:42 PM, Nathan Eisenberg wrote: My guess is that the millions of residential users will be less and less enthused with (pure) PA each time they change service providers... Hi, almost everytime I open my laptop it gets a different ip address, sometimes I'm home and it gets that same ip it had the last time I was there. once in a very great while my home gateway changes it's ip address, since it does dynamic dns updates I have no trouble finding it, and in fact the network storage perhipheral that I have thinks nothing of using upnp to punch a hole in the gateway all on it's own... These are mostly low touch and either zero or very little configuration required. The consumer isn't got to tollerate systems which don't work automatically or very nearly so, which means among other things having an ip address change or even have your home gateway(s) help the downstream devices renumber themselves. That claim seems to be unsupported by current experience. Please elaborate. It's not only not supported by current experience it seems really unlikely to be supported by future experience, e.g. many device you have are mobile, and a connected to more than one network at a time they still may be gatewaying for downstream devices. if somewhere along the line you solved this for the mobility case it is going to work for networks which are more stable. Nathan
Re: Emulating a cellular interface
On 6 Nov 2010, at 05:53, Saqib Ilyas wrote: A friend of mine is doing some testing where he wishes to emulate a cellular-like interfaces with random drops and all, out of an ethernet interface. Since we have plenty of network and system ops on the list, I thought we might have luck posting the question here. Is anyone aware of a simple tool to do this, other than rather involved configuration of iptables? Not withstanding Mikael's comments that it shouldn't be lossy, at times when you want to simulate lossy (and jittery, and shaped, and ) conditions, the best way I have found to do this is FreeBSD's dummynet : http://www.freebsd.org/cgi/man.cgi?query=ipfwsektion=8#TRAFFIC_SHAPER_(DUMMYNET)_CONFIGURATION Andy
Re: BGP support on ASA5585-X
- Original Message - From: gordon b slater gordsla...@ieee.org To: Tony Varriale tvarri...@comcast.net Cc: nanog@nanog.org Sent: Saturday, November 06, 2010 4:38 AM Subject: Re: BGP support on ASA5585-X On Fri, 2010-11-05 at 21:50 -0500, Tony Varriale wrote: somebody said: They could make it out of the box but this is why Dylan made his statement. His statement is far fetched at best. Unless of course he's speaking of 100 million line ACLs. Can I just ask out of technical curiosity: Well, let me preface this thread with: the previous poster was/is from a hosting company. ASAs aren't ISP/Hosting level boxes. They are SMB to enterprise boxes. It's like saying yeah that 2501 doesn't meet our customer agg requirements at our ISP. Of course it doesn't. Wrong product wrong solution. With that said, from what I see in the field 10s of thousands. I've seen as high as 80k. But, once you get into that many ACLs, IMO there's either an ACL or security/network design problem. tv
Re: RINA - scott whaps at the nanog hornets nest :-)
On Fri, 5 Nov 2010 21:40:30 -0400 Marshall Eubanks t...@americafree.tv wrote: On Nov 5, 2010, at 7:26 PM, Mark Smith wrote: On Fri, 5 Nov 2010 15:32:30 -0700 Scott Weeks sur...@mauigateway.com wrote: It's really quiet in here. So, for some Friday fun let me whap at the hornets nest and see what happens... ;-) http://www.ionary.com/PSOC-MovingBeyondTCP.pdf Who ever wrote that doesn't know what they're talking about. LISP is not the IETF's proposed solution (the IETF don't have one, the IRTF do), Um, I would not agree. The IRTF RRG considered and is documenting a lot of things, but did not come to any consensus as to which one should be a proposed solution. I probably got a bit keen, I've been reading through the IRTF RRG Recommendation for a Routing Architecture draft which, IIRC, makes a recommendation to pursue Identifier/Locator Network Protocol rather than LISP. Regards, Mark. Regards Marshall and streaming media was seen to be one of the early applications of the Internet - these types of applications is why TCP was split out of IP, why UDP was invented, and why UDP has has a significantly different protocol number to TCP. -- NAT is your friend IP doesn’t handle addressing or multi-homing well at all The IETF’s proposed solution to the multihoming problem is called LISP, for Locator/Identifier Separation Protocol. This is already running into scaling problems, and even when it works, it has a failover time on the order of thirty seconds. TCP and IP were split the wrong way IP lacks an addressing architecture Packet switching was designed to complement, not replace, the telephone network. IP was not optimized to support streaming media, such as voice, audio broadcasting, and video; it was designed to not be the telephone network. -- And so, ...the first principle of our proposed new network architecture: Layers are recursive. I can hear the angry hornets buzzing already. :-) scott
Re: RINA - scott whaps at the nanog hornets nest :-)
On 11/5/2010 5:32 PM, Scott Weeks wrote: It's really quiet in here. So, for some Friday fun let me whap at the hornets nest and see what happens...;-) http://www.ionary.com/PSOC-MovingBeyondTCP.pdf SCTP is a great protocol. It has already been implemented in a number of stacks. With these benefits over that theory, it still hasn't become mainstream yet. People are against change. They don't want to leave v4. They don't want to leave tcp/udp. Technology advances, but people will only change when they have to. Jack (lost brain cells actually reading that pdf)
RE: RINA - scott whaps at the nanog hornets nest :-)
Sent: Saturday, November 06, 2010 9:45 AM To: nanog@nanog.org Subject: Re: RINA - scott whaps at the nanog hornets nest :-) On 11/5/2010 5:32 PM, Scott Weeks wrote: It's really quiet in here. So, for some Friday fun let me whap at the hornets nest and see what happens...;-) http://www.ionary.com/PSOC-MovingBeyondTCP.pdf SCTP is a great protocol. It has already been implemented in a number of stacks. With these benefits over that theory, it still hasn't become mainstream yet. People are against change. They don't want to leave v4. They don't want to leave tcp/udp. Technology advances, but people will only change when they have to. Jack (lost brain cells actually reading that pdf) I believe SCTP will become more widely used in the mobile device world. You can have several different streams so you can still get an IM, for example, while you are streaming a movie. Eliminating the head of line blockage on thin connections is really valuable. It would be particularly useful where you have different types of traffic from a single destination. File transfer, for example, might be a good application where one might wish to issue interactive commands to move around the directory structure while a large file transfer is taking place. If you really want to shake a hornet's nest, try getting people to get rid of this idiotic 1500 byte MTU in the middle of the internet and try to get everyone to adopt 9000 byte frames as the standard. That change right there would provide a huge performance increase, load reduction on networks and servers, and with a greater number of native ethernet end to end connections, there is no reason to use 1500 byte MTUs. This is particularly true with modern PMUT methods (such as with modern Linux kernels ... /proc/sys/net/ipv4/tcp_mtu_probing set to either 1 or 2). While the end points should just be what they are, there is no reason for the middle portion, the long haul transport part, to be MTU 1500. http://staff.psc.edu/mathis/MTU/
Re: NAP of the Capital Region
Thank you On Sat, Nov 6, 2010 at 3:12 AM, Mehmet Akcin meh...@akcin.net wrote: On Nov 5, 2010, at 3:19 PM, Santino Codispoti wrote: Does anyone have an up to date list of the carriers that are within the NAP of the Capital Region? Abovenet Att Level3 Verizon TATA Cogent (I am being told is joining or has just joined).. Terremark's transit network I actually have a decent size deployment there. Culpeper is a lovely city you will enjoy there. let me know if you need more info, i am considered culpeper / nap cr local according to many people.. ;) mehmet
RE: RINA - scott whaps at the nanog hornets nest :-)
Le samedi 06 novembre 2010 à 12:15 -0700, George Bonser a écrit : Sent: Saturday, November 06, 2010 9:45 AM To: nanog@nanog.org Subject: Re: RINA - scott whaps at the nanog hornets nest :-) On 11/5/2010 5:32 PM, Scott Weeks wrote: It's really quiet in here. So, for some Friday fun let me whap at the hornets nest and see what happens...;-) http://www.ionary.com/PSOC-MovingBeyondTCP.pdf SCTP is a great protocol. It has already been implemented in a number of stacks. With these benefits over that theory, it still hasn't become mainstream yet. People are against change. They don't want to leave v4. They don't want to leave tcp/udp. Technology advances, but people will only change when they have to. Jack (lost brain cells actually reading that pdf) I believe SCTP will become more widely used in the mobile device world. You can have several different streams so you can still get an IM, for example, while you are streaming a movie. Eliminating the head of line blockage on thin connections is really valuable. It would be particularly useful where you have different types of traffic from a single destination. File transfer, for example, might be a good application where one might wish to issue interactive commands to move around the directory structure while a large file transfer is taking place. If you really want to shake a hornet's nest, try getting people to get rid of this idiotic 1500 byte MTU in the middle of the internet I doubt that 1500 is (still) widely used in our Internet... Might be, though, that most of us don't go all the way to 9k. mh and try to get everyone to adopt 9000 byte frames as the standard. That change right there would provide a huge performance increase, load reduction on networks and servers, and with a greater number of native ethernet end to end connections, there is no reason to use 1500 byte MTUs. This is particularly true with modern PMUT methods (such as with modern Linux kernels ... /proc/sys/net/ipv4/tcp_mtu_probing set to either 1 or 2). While the end points should just be what they are, there is no reason for the middle portion, the long haul transport part, to be MTU 1500. http://staff.psc.edu/mathis/MTU/ signature.asc Description: Ceci est une partie de message numériquement signée
RE: RINA - scott whaps at the nanog hornets nest :-)
I doubt that 1500 is (still) widely used in our Internet... Might be, though, that most of us don't go all the way to 9k. mh Last week I asked the operator of fairly major public peering points if they supported anything larger than 1500 MTU. The answer was no.
Re: RINA - scott whaps at the nanog hornets nest :-)
On Sat, Nov 6, 2010 at 12:32 PM, George Bonser gbon...@seven.com wrote: I doubt that 1500 is (still) widely used in our Internet... Might be, though, that most of us don't go all the way to 9k. mh Last week I asked the operator of fairly major public peering points if they supported anything larger than 1500 MTU. The answer was no. There's still a metric buttload of SONET interfaces in the core that won't go above 4470. So, you might conceivably get 4k MTU at some point in the future, but it's really, *really* unlikely you'll get to 9k MTU any time in the next decade. Matt
RE: RINA - scott whaps at the nanog hornets nest :-)
There's still a metric buttload of SONET interfaces in the core that won't go above 4470. So, you might conceivably get 4k MTU at some point in the future, but it's really, *really* unlikely you'll get to 9k MTU any time in the next decade. Matt Agreed. But even 4470 is better than 1500. 1500 was fine for 10G ethernet, it is actually pretty silly for GigE and better. This survey that Dykstra did back in 1999 points out exactly what you mentioned: http://sd.wareonearth.com/~phil/jumbo.html And that was over a decade ago. There is no reason, in my opinion, for the various peering points to be a 1500 byte bottleneck in a path that might otherwise be larger. Increasing that from 1500 to even 3000 or 4500 gives a measurable performance boost over high latency connections such as from Europe to APAC or Western US. This is not to mention a reduction in the number of ACK packets flying back and forth across the Internet and a general reduction in the number of packets that must be processed for a given transaction.
RE: RINA - scott whaps at the nanog hornets nest :-)
1500 was fine for 10G I meant, of course, 10M ethernet.
RE: RINA - scott whaps at the nanog hornets nest :-)
Last week I asked the operator of fairly major public peering points if they supported anything larger than 1500 MTU. The answer was no. There's still a metric buttload of SONET interfaces in the core that won't go above 4470. So, you might conceivably get 4k MTU at some point in the future, but it's really, *really* unlikely you'll get to 9k MTU any time in the next decade. Matt There is no reason why we are still using 1500 byte MTUs at exchange points. From Dykstra's paper (note that this was written in 1999 before wide deployment of GigE): (quote) Does GigE have a place in a NAP? Not if it reduces the available MTU! Network Access Points (NAPs) are at the very core of the internet. They are where multiple wide area networks come together. A great deal of internet paths traverse at least one NAP. If NAPs put a limitation on MTU, then all WANs, LANs, and end systems that traverse that NAP are subject to that limitation. There is nothing the end systems could do to lift the performance limit imposed by the NAP's MTU. Because of their critically important place in the internet, NAPs should be doing everything they can to remove performance bottlenecks. They should be among the most permissive nodes in the network as far as the parameter space they make available to network applications. The economic and bandwidth arguments for GigE NAPs however are compelling. Several NAPs today are based on switched FDDI (100 Mbps, 4 KB MTU) and are running out of steam. An upgrade to OC3 ATM (155 Mbps, 9 KB MTU) is hard to justify since it only provides a 50% increase in bandwidth. And trying to install a switch that could support 50+ ports of OC12 ATM is prohibitively expensive! A 64 port GigE switch however can be had for about $100k and delivers 50% more bandwidth per port at about 1/3 the cost of OC12 ATM. The problem however is 1500 byte frames, but GigE with jumbo frames would permit full FDDI MTU's and only slightly reduce a full Classical IP over ATM MTU (9180 bytes). A recent example comes from the Pacific Northwest Gigapop in Seattle which is based on a collection of Foundry gigabit ethernet switches. At Supercomputing '99, Microsoft and NCSA demonstrated HDTV over TCP at over 1.2 Gbps from Redmond to Portland. In order to achieve that performance they used 9000 byte packets and thus had to bypass the switches at the NAP! Let's hope that in the future NAPs don't place 1500 byte packet limitations on applications. (end quote) Having the exchange point of ethernet connections at 1500 MTU will not in any way adversely impact the traffic on the path. If the end points are already at 1500, this change is completely transparent to them. If the end points are capable of 1500 already, then it would allow the flow to increase its packet sizes and reduce the number of packets flowing through the network and give a huge gain in performance, even in the face of packet loss.
Re: RINA - scott whaps at the nanog hornets nest :-)
On Sat, Nov 06, 2010 at 12:32:55PM -0700, George Bonser wrote: I doubt that 1500 is (still) widely used in our Internet... Might be, though, that most of us don't go all the way to 9k. Last week I asked the operator of fairly major public peering points if they supported anything larger than 1500 MTU. The answer was no. It would be absolutely trivial for them to enable jumbo frames, there is just no demand for them to do so, as supporting Internet wide jumbo frames (particularly over exchange points) is highly non-scalable in practice. It's perfectly safe to have the L2 networks in the middle support the largest MTU values possible (other than maybe triggering an obscure Force10 bug or something :P), so they could roll that out today and you probably wouldn't notice. The real issue is with the L3 networks on either end of the exchange, since if the L3 routers that are trying to talk to each other don't agree about their MTU valus precisely, packets are blackholed. There are no real standards for jumbo frames out there, every vendor (and in many cases particular type/revision of hardware made by that vendor) supports a slightly different size. There is also no negotiation protocol of any kind, so the only way to make these two numbers match precisely is to have the humans on both sides talk to each other and come up with a commonly supported value. There are two things that make this practically impossible to support at scale, even ignoring all of the grief that comes from trying to find a clueful human to talk to on the other end of your connection to a third party (which is a huge problem in and of itself): #1. There is currently no mechanism on any major router to set multiple MTU values PER NEXTHOP on a multi-point exchange, so to do jumbo frames over an exchange you would have to pick a single common value that EVERYONE can support. This also means you can't mix and match jumbo and non-jumbo participants over the same exchange, you essentially have to set up an entirely new exchange point (or vlan within the same exchange) dedicated to the jumbo frame support, and you still have to get a common value that everyone can support. Ironically many routers (many kinds of Cisco and Juniper routers at any rate) actually DO support per-nexthop MTUs in hardware, there is just no mechanism exposed to the end user to configure those values, let alone auto-negotiate them. #2. The major vendors can't even agree on how they represent MTU sizes, so entering the same # into routers from two different vendors can easily result in incompatible MTUs. For example, on Juniper when you type mtu 9192, this is INCLUSIVE of the L2 header, but on Cisco the opposite is true. So to make a Cisco talk to a Juniper that is configured 9192, you would have to configure mtu 9178. Except it's not even that simple, because now if you start adding vlan tagging the L2 header size is growing. If you now configure vlan tagging on the interface, you've got to make the Cisco side 9174 to match the Juniper's 9192. And if you configure flexible-vlan-tagging so you can support q-in-q, you've now got to configure to Cisco side for 9170. As an operator who DOES fully support 9k+ jumbos on every internal link in my network, and as many external links as I can find clueful people to talk to on the other end to negotiate the correct values, let me just tell you this is a GIANT PAIN IN THE ASS. And we're not even talking about making sure things actually work right for the end user. Your IGP may not come up at all if the MTUs are misconfigured, but EBGP certainly will, even if the two sides are actually off by a few bytes. The maximum size of a BGP message is 4096 octets, and there is no mechanism to pad a message and try to detect MTU incompatibility, so what will actually happen in real life is the end user will try to send a big jumbo frame through and find that some of their packets are randomly and silently blackholed. This would be an utter nightmare to support and diagnose. Realistically I don't think you'll ever see even a serious attempt at jumbo frame support implemented in any kind of scale until there is a negotiation protocol and some real standards for the mtu size that must be supported, which is something that no standards body (IEEE, IETF, etc) has seemed inclined to deal with so far. Of course all of this is based on the assumption that path mtu discovery will work correctly once the MTU valus ARE correctly configured on the L3 routers, which is a pretty huge assumption, given all the people who stupidly filter ICMP. Oh and even if you solved all of those problems, I could trivially DoS your router with some packets that would overload your ability to generate ICMP Unreach Needfrag messages for PMTUD, and then all your jumbo frame end users going through that router would be blackholed as well. Great idea in theory, epic disaster in practice, at least given
RE: RINA - scott whaps at the nanog hornets nest :-)
Completely agree with you on that point. I'd love to see Equinix, AMSIX, LINX, DECIX, and the rest of the large exchange points put out statements indicating their ability to transparently support jumbo frames through their fabrics, or at least indicate a roadmap and a timeline to when they think they'll be able to support jumbo frames throughout the switch fabrics. Matt Yes, in moving from SONET to Ethernet exchange points, we have actually reduced the potential performance of applications across the network for no good reason, in many cases.
Re: RINA - scott whaps at the nanog hornets nest :-)
On 11/6/2010 3:36 PM, Richard A Steenbergen wrote: #2. The major vendors can't even agree on how they represent MTU sizes, so entering the same # into routers from two different vendors can easily result in incompatible MTUs. For example, on Juniper when you type mtu 9192, this is INCLUSIVE of the L2 header, but on Cisco the opposite is true. So to make a Cisco talk to a Juniper that is configured 9192, you would have to configure mtu 9178. Except it's not even that simple, because now if you start adding vlan tagging the L2 header size is growing. If you now configure vlan tagging on the interface, you've got to make the Cisco side 9174 to match the Juniper's 9192. And if you configure flexible-vlan-tagging so you can support q-in-q, you've now got to configure to Cisco side for 9170. I agree with the rest, but actually, I've found that juniper has a manual physical mtu with a separate logical mtu available, while cisco sets a logical mtu and autocalculates the physical mtu (or perhaps the physical is just hard set to maximum). It depends on the equipment in cisco, though. L3 and L2 interfaces treat mtu differently, especially noticeable when doing q-in-q on default switches without adjusting the mtu. Also noticeable in mtu setting methods on a c7600(l2 vs l3 methods) In practice, i think you can actually pop the physical mtu on the juniper much higher than necessary, so long as you set the family based logical mtu's at the appropriate value. Jack
Re: RINA - scott whaps at the nanog hornets nest :-)
Le samedi 06 novembre 2010 à 13:01 -0700, Matthew Petach a écrit : On Sat, Nov 6, 2010 at 12:32 PM, George Bonser gbon...@seven.com wrote: I doubt that 1500 is (still) widely used in our Internet... Might be, though, that most of us don't go all the way to 9k. mh Last week I asked the operator of fairly major public peering points if they supported anything larger than 1500 MTU. The answer was no. There's still a metric buttload of SONET interfaces in the core that won't go above 4470. So, you might conceivably get 4k MTU at some point in the future, but it's really, *really* unlikely you'll get to 9k MTU any time in the next decade. Right, though I'm unsure of decade since we're moving off SDH/Sonet quite agressively. mh Matt signature.asc Description: Ceci est une partie de message numériquement signée
RE: RINA - scott whaps at the nanog hornets nest :-)
It's perfectly safe to have the L2 networks in the middle support the largest MTU values possible (other than maybe triggering an obscure Force10 bug or something :P), so they could roll that out today and you probably wouldn't notice. The real issue is with the L3 networks on either end of the exchange, since if the L3 routers that are trying to talk to each other don't agree about their MTU valus precisely, packets are blackholed. There are no real standards for jumbo frames out there, every vendor (and in many cases particular type/revision of hardware made by that vendor) supports a slightly different size. There is also no negotiation protocol of any kind, so the only way to make these two numbers match precisely is to have the humans on both sides talk to each other and come up with a commonly supported value. That is not a new problem. That is also true to today with last mile links (e.g. dialup) that support 1500 byte MTU. What is different today is RFC 4821 PMTU discovery which deals with the black holes. RFC 4821 PMTUD is that negotiation that is lacking. It is there. It is deployed. It actually works. No more relying on someone sending the ICMP packets through in order for PMTUD to work! There are two things that make this practically impossible to support at scale, even ignoring all of the grief that comes from trying to find a clueful human to talk to on the other end of your connection to a third party (which is a huge problem in and of itself): #1. There is currently no mechanism on any major router to set multiple MTU values PER NEXTHOP on a multi-point exchange, so to do jumbo frames over an exchange you would have to pick a single common value that EVERYONE can support. This also means you can't mix and match jumbo and non-jumbo participants over the same exchange, you essentially have to set up an entirely new exchange point (or vlan within the same exchange) dedicated to the jumbo frame support, and you still have to get a common value that everyone can support. Ironically many routers (many kinds of Cisco and Juniper routers at any rate) actually DO support per-nexthop MTUs in hardware, there is just no mechanism exposed to the end user to configure those values, let alone auto-negotiate them. Is there any gear connected to a major IX that does NOT support large frames? I am not aware of any manufactured today. Even cheap D-Link gear supports them. I believe you would be hard-pressed to locate gear that doesn't support it at any major IX. Granted, it might require the change of a global config value and a reboot for it to take effect in some vendors. http://darkwing.uoregon.edu/~joe/jumbo-clean-gear.html #2. The major vendors can't even agree on how they represent MTU sizes, so entering the same # into routers from two different vendors can easily result in incompatible MTUs. For example, on Juniper when you type mtu 9192, this is INCLUSIVE of the L2 header, but on Cisco the opposite is true. So to make a Cisco talk to a Juniper that is configured 9192, you would have to configure mtu 9178. Except it's not even that simple, because now if you start adding vlan tagging the L2 header size is growing. If you now configure vlan tagging on the interface, you've got to make the Cisco side 9174 to match the Juniper's 9192. And if you configure flexible-vlan-tagging so you can support q-in-q, you've now got to configure to Cisco side for 9170. Again, the size of the MTU on the IX port doesn't change the size of the packets flowing through that gear. A packet sent from an end point with an MTU of 1500 will be unchanged by the router change. A flow to an end point with 1500 MTU will also be adjusted down by PMTU Discovery just as it is now when communicating with a dialup end point that might have 600 MTU. The only thing that is going to change from the perspective of the routers is the communications originated by the router which will basically just be the BGP session. When the TCP session is established for BGP, the smaller of the two MTU will report an MSS value which is the largest packet size it can support. The other unit will not send a packet larger than this even if it has a larger MTU. Just because the MTU is 9000 doesn't mean it is going to aggregate 1500 byte packets flowing through it into 9000 byte packets, it is going to pass them through unchanged. As for the configuration differences between units, how does that change from the way things are now? A person configuring a Juniper for 1500 byte packets already must know the difference as that quirk of including the headers is just as true at 1500 bytes as it is at 9000 bytes. Does the operator suddenly become less competent with their gear when they use a different value? Also, a 9000 byte MTU would be a happy value that practically everyone supports these days, including ethernet adaptors on host machines. As an operator who DOES fully support 9k+
Re: RINA - scott whaps at the nanog hornets nest :-)
Le samedi 06 novembre 2010 à 13:29 -0700, Matthew Petach a écrit : On Sat, Nov 6, 2010 at 1:22 PM, George Bonser gbon...@seven.com wrote: Last week I asked the operator of fairly major public peering points if they supported anything larger than 1500 MTU. The answer was no. There's still a metric buttload of SONET interfaces in the core that won't go above 4470. So, you might conceivably get 4k MTU at some point in the future, but it's really, *really* unlikely you'll get to 9k MTU any time in the next decade. Matt There is no reason why we are still using 1500 byte MTUs at exchange points. Completely agree with you on that point. I'd love to see Equinix, AMSIX, LINX, DECIX, and the rest of the large exchange points put out statements indicating their ability to transparently support jumbo frames through their fabrics, or at least indicate a roadmap and a timeline to when they think they'll be able to support jumbo frames throughout the switch fabrics. Agree. Some people do: Netnod. ;) (1500 in one option, 4470 in another, part of a single interconnection deal -- unless I'm mistaken about the contractual side of things). mh Matt signature.asc Description: Ceci est une partie de message numériquement signée
Re: RINA - scott whaps at the nanog hornets nest :-)
On Sat, Nov 6, 2010 at 2:21 PM, George Bonser gbon...@seven.com wrote: ... As for the configuration differences between units, how does that change from the way things are now? A person configuring a Juniper for 1500 byte packets already must know the difference as that quirk of including the headers is just as true at 1500 bytes as it is at 9000 bytes. Does the operator suddenly become less competent with their gear when they use a different value? Also, a 9000 byte MTU would be a happy value that practically everyone supports these days, including ethernet adaptors on host machines. While I think 9k for exchange points is an excellent target, I'll reiterate that there's a *lot* of SONET interfaces out there that won't be going away any time soon, so practically speaking, you won't really get more than 4400 end-to-end, even if you set your hosts to 9k as well. And yes, I agree with ras; having routers able to adjust on a per-session basis would be crucial; otherwise, we'd have to ask the peeringdb folks to add a field that lists each participant's interface MTU at each exchange, and part of peermaker would be a check that could warn you, sorry, you can't peer with network X, your MTU is too small. ;-P (though that would make for an interesting deepering notice...sorry, we will be unable to peer with networks who cannot support large MTUs at exchange point X after this date.) Matt
Re: RINA - scott whaps at the nanog hornets nest :-)
Completely agree with you on that point. I'd love to see Equinix, AMSIX, LINX, DECIX, and the rest of the large exchange points put out statements indicating their ability to transparently support jumbo frames through their fabrics, or at least indicate a roadmap and a timeline to when they think they'll be able to support jumbo frames throughout the switch fabrics. The Netnod IX in Sweden has offered 4470 MTU for many years. From http://www.netnod.se/technical_information.shtml One VLAN handles standard sized Ethernet frames (MTU 1500 bytes) and one handles Ethernet Jumbo frames with MTU-size 4470 bytes. Steinar Haug, Nethelp consulting, sth...@nethelp.no
Re: RINA - scott whaps at the nanog hornets nest :-)
RFC 4821 PMTUD is that negotiation that is lacking. It is there. It is deployed. It actually works. No more relying on someone sending the ICMP packets through in order for PMTUD to work! For some value of works. There are way too many places filtering ICMP for PMTUD to work consistently. PMTUD is *not* the solution, unfortunately. Steinar Haug, Nethelp consulting, sth...@nethelp.no
RE: RINA - scott whaps at the nanog hornets nest :-)
-Original Message- From: sth...@nethelp.no [mailto:sth...@nethelp.no] Sent: Saturday, November 06, 2010 2:40 PM To: George Bonser Cc: r...@e-gerbil.net; nanog@nanog.org Subject: Re: RINA - scott whaps at the nanog hornets nest :-) RFC 4821 PMTUD is that negotiation that is lacking. It is there. It is deployed. It actually works. No more relying on someone sending the ICMP packets through in order for PMTUD to work! For some value of works. There are way too many places filtering ICMP for PMTUD to work consistently. PMTUD is *not* the solution, unfortunately. Steinar Haug, Nethelp consulting, sth...@nethelp.no I guess you missed the part about 4821 PMTUD does not rely on ICMP. Modern PMTUD does not rely on ICMP and works even where it is filtered.
Re: RINA - scott whaps at the nanog hornets nest :-)
On 11/6/2010 4:40 PM, sth...@nethelp.no wrote: For some value of works. There are way too many places filtering ICMP for PMTUD to work consistently. PMTUD is *not* the solution, unfortunately. He was referring to the updated RFC 4821. In the absence of ICMP messages, the proper MTU is determined by starting with small packets and probing with successively larger packets. The bulk of the algorithm is implemented above IP, in the transport layer (e.g., TCP) or other Packetization Protocol that is responsible for determining packet boundaries. It is designed to support working without ICMP. It's draw back is the ramp time, which makes it useless for small transactions, but it can be argued that small transactions don't need larger MTUs. Jack
RE: RINA - scott whaps at the nanog hornets nest :-)
While I think 9k for exchange points is an excellent target, I'll reiterate that there's a *lot* of SONET interfaces out there that won't be going away any time soon, so practically speaking, you won't really get more than 4400 end-to-end, even if you set your hosts to 9k as well. Agreed. But in the meantime, removing the 1500 bottlenecks at the ethernet peering ports would at least provide the potential for the connection to scale up to the 4400 available by the SONET links. Right now, nothing is possible above 1500 for most flows that traverse an ethernet peering point. My point is that 1500 is a relic. Put another way, how come PoS at 4400 in the path doesn't break anything currently between endpoints while any suggestion that ethernet be made larger than 1500 in the path causes all this reaction? We already HAVE MTUs larger than 1500 in the middle part of the path. This really doesn't change much of anything from that perspective. For example, simply taking Ethernet to 3000 would still be smaller than SONET and even that would provide measurable benefit. There is a certain but that is the way it has always been done inertia that I believe needs to be overcome. Increasing the path MTU has the potential to greatly improve performance at practically no cost to anyone involved. We are throttling performance of the Internet for no sound technical reason, in my opinion. Now I could see where someone selling jumbo paths at a premium might be reluctant to see the Internet generally go that path as it would decrease their value add, but that is a different story.
RE: RINA - scott whaps at the nanog hornets nest :-)
He was referring to the updated RFC 4821. In the absence of ICMP messages, the proper MTU is determined by starting with small packets and probing with successively larger packets. The bulk of the algorithm is implemented above IP, in the transport layer (e.g., TCP) or other Packetization Protocol that is responsible for determining packet boundaries. It is designed to support working without ICMP. It's draw back is the ramp time, which makes it useless for small transactions, but it can be argued that small transactions don't need larger MTUs. Jack That is also somewhat mitigated in that it operates in two modes. The first mode is what I would call passive mode and only comes into play once a black hole is detected. It does not change the operation of TCP until a packet disappears. The second method is the active mode where it actively probes with increasing packet sizes until it hits a black hole or gets an ICMP response.
Re: RINA - scott whaps at the nanog hornets nest :-)
RFC 4821 PMTUD is that negotiation that is lacking. It is there. It is deployed. It actually works. No more relying on someone sending the ICMP packets through in order for PMTUD to work! For some value of works. There are way too many places filtering ICMP for PMTUD to work consistently. PMTUD is *not* the solution, unfortunately. I guess you missed the part about 4821 PMTUD does not rely on ICMP. Modern PMTUD does not rely on ICMP and works even where it is filtered. As long as the implementations are few and far between: https://www.psc.edu/~mathis/MTU/ http://www.ietf.org/mail-archive/web/rrg/current/msg05816.html the traditional ICMP-based PMTUD is what most of use face today. Steinar Haug, Nethelp consulting, sth...@nethelp.no
Re: RINA - scott whaps at the nanog hornets nest :-)
On 11/6/2010 4:52 PM, George Bonser wrote: That is also somewhat mitigated in that it operates in two modes. The first mode is what I would call passive mode and only comes into play once a black hole is detected. It does not change the operation of TCP until a packet disappears. The second method is the active mode where it actively probes with increasing packet sizes until it hits a black hole or gets an ICMP response. While it reads well, what implementations are actually in use? As with most protocols, it is useless if it doesn't have a high penetration. Jack
Re: RINA - scott whaps at the nanog hornets nest :-)
On 06/11/10 15:56 -0500, Jack Bates wrote: On 11/6/2010 3:36 PM, Richard A Steenbergen wrote: #2. The major vendors can't even agree on how they represent MTU sizes, so entering the same # into routers from two different vendors can easily result in incompatible MTUs. For example, on Juniper when you type mtu 9192, this is INCLUSIVE of the L2 header, but on Cisco the opposite is true. So to make a Cisco talk to a Juniper that is configured 9192, you would have to configure mtu 9178. Except it's not even that simple, because now if you start adding vlan tagging the L2 header size is growing. If you now configure vlan tagging on the interface, you've got to make the Cisco side 9174 to match the Juniper's 9192. And if you configure flexible-vlan-tagging so you can support q-in-q, you've now got to configure to Cisco side for 9170. I agree with the rest, but actually, I've found that juniper has a manual physical mtu with a separate logical mtu available, while cisco sets a logical mtu and autocalculates the physical mtu (or perhaps the physical is just hard set to maximum). It depends on the equipment in cisco, though. L3 and L2 interfaces treat mtu differently, especially noticeable when doing q-in-q on default switches without adjusting the mtu. Also noticeable in mtu setting methods on a c7600(l2 vs l3 methods) In practice, i think you can actually pop the physical mtu on the juniper much higher than necessary, so long as you set the family based logical mtu's at the appropriate value. Cisco calls this 'routing mtu' and 'jumbo mtu' on the platform we have to distinguish between layer 3 mtu (where packets which exceed that size get fragmented) and layer 2 mtu (where frames that exceed that size get dropped on the floor as 'giants'). We always set layer 2 mtu as high as we can on our switches (9000+), and strictly leave everything else (layer 3) at 1500 bytes. In my experience, setting two hosts to differing layer 3 MTUs will lead to fragmentation at some point along the routing path or within one of the hosts. With Path MTU Discovery moved to the end hosts in v6, the concept of a standardized MTU should go away, and open up much larger MTUs. However, that may not happen until dual stacked v4/v6 goes away. -- Dan White
RE: RINA - scott whaps at the nanog hornets nest :-)
As long as the implementations are few and far between: https://www.psc.edu/~mathis/MTU/ http://www.ietf.org/mail-archive/web/rrg/current/msg05816.html the traditional ICMP-based PMTUD is what most of use face today. Steinar Haug, Nethelp consulting, sth...@nethelp.no It is already the standard with currently shipping Solaris and on by default. It ships in Linux 2.6.32 but is off by default (sysctl I referred to earlier). It ships with Microsoft Windows as Blackhole Router Detection and is on by default since Windows 2003 SP2. The notion that it isn't widely deployed is not the case. It has been much more widely deployed now than it was 12 months ago. And again, deploying 9000 byte MTU in the MIDDLE of the network is not going to change PMTUD one iota unless the rest of the path between both end points is 9000 bytes since the end points are already probably 1500 hundred anyway. Changing the MTU on a router in the path is not going to cause the packets flowing through it to change in size. It will not introduce any additional PMTU issues as those are end-to-end problems anyway, if anything it should REDUCE them by making the path 9000 byte clean in the middle, there shouldn't BE any PMTU problems in the middle of the network and things like reduced effective MTU from tunnels in the middle of networks disappears. For example, if some network is using MTU 1500 and tunnels something over GRE and doesn't enlarge the MTU of the interfaces handling that tunnel, and if they block ICMP from inside their net, then they have introduced a PMTU issue by reducing the effective MTU of the encapsulated packets. I deal with that very problem all the time. Increasing the MTU on those paths to 9000 would enable 1500 byte packets to travel unmolested and eliminate that PMTU problem. In fact, many networks already get around that problem by increasing the MTU on tunnels just so they can avoid fragmenting the encapsulated packet. Increasing to 9000 would REDUCE problems across the network for end points using an MTU smaller than 9000.
RE: RINA - scott whaps at the nanog hornets nest :-)
While it reads well, what implementations are actually in use? As with most protocols, it is useless if it doesn't have a high penetration. Jack Solaris 10, in use and on by default. Available on Windows for a very long time as blackhole router detection was off by default originally, on by default since Win2003 SP2 and is on by default in Win7. It is on by default since Windows XP SP3. It is available on Linux but not yet on by default. I expect that will change once it gets enough use. I am not sure of the default deployment in MacOS and BSD but know it is available.
Re: RINA - scott whaps at the nanog hornets nest :-)
On Sat, Nov 06, 2010 at 02:21:51PM -0700, George Bonser wrote: That is not a new problem. That is also true to today with last mile links (e.g. dialup) that support 1500 byte MTU. What is different today is RFC 4821 PMTU discovery which deals with the black holes. RFC 4821 PMTUD is that negotiation that is lacking. It is there. It is deployed. It actually works. No more relying on someone sending the ICMP packets through in order for PMTUD to work! The only thing this adds is trial-and-error probing mechanism per flow, to try and recover from the infinite blackholing that would occur if your ICMP is blocked in classic PMTUD. If this actually happened in any scale, it would create a performance and overhead penalty that is far worse than the original problem you're trying to solve. Say you have two routers talking to each other over a L2 switched infrastructure (i.e. an exchange point). In order for PMTUD to function quickly and effectively, the two routers on each end MUST agree on the MTU value of the link between them. If router A thinks it is 9000, and router B thinks it is 8000, when router A comes along and tries to send a 8001 byte packet it will be silently discarded, and the only way to recover from this is with trial-and-error probing by the endpoints after they detect what they believe to be MTU blackholing. This is little more than a desperate ghetto hack designed to save the connection from complete disaster. The point where a protocol is needed is between router A and router B, so they can determine the MTU of the link, without needing to involve the humans in a manual negotiation process. Ideally this would support multi-point LANs over ethernet as well, so .1 could have an MTU of 9000, .2 could have an MTU of 8000, etc. And of course you have to make sure that you can actually PASS the MTU across the wire (if the switch in the middle can't handle it, the packet will also be silently dropped), so you can't just rely on the other side to tell you what size it THINKS it can support. You don't have a shot in hell of having MTUs negotiated correctly or PMTUD work well until this is done. Is there any gear connected to a major IX that does NOT support large frames? I am not aware of any manufactured today. Even cheap D-Link gear supports them. I believe you would be hard-pressed to locate gear that doesn't support it at any major IX. Granted, it might require the change of a global config value and a reboot for it to take effect in some vendors. http://darkwing.uoregon.edu/~joe/jumbo-clean-gear.html If that doesn't prove my point about every vendor having their own definition of what # is and isn't supported, I don't know what does. Also, I don't know what exchanges YOU connect to, but I very clearly see a giant pile of gear on that list that is still in use today. :) As for the configuration differences between units, how does that change from the way things are now? A person configuring a Juniper for 1500 byte packets already must know the difference as that quirk of including the headers is just as true at 1500 bytes as it is at 9000 bytes. Does the operator suddenly become less competent with their gear when they use a different value? Also, a 9000 byte MTU would be a happy value that practically everyone supports these days, including ethernet adaptors on host machines. Everything defaults to 1500 today, so nobody has to do anything. Again, I'm actually doing this with people today on a very large network with lots of peers all over the world, so I have a little bit of experience with exactly what goes wrong. Nearly everyone who tries to figure out the correct MTU between vendors and with a third party network gets it wrong, at least some significant percentage of the time. And honestly I can't even find an interesting number of people willing to turn on BFD, something with VERY clear benefits for improving failure detection time over an IX (for the next time Equinix decides to do one of their 10PM maintenances that causes hours of unreachability until hold timers expire :P). If the IX operators saw any significant demand they would have already turned it on already. -- Richard A Steenbergen r...@e-gerbil.net http://www.e-gerbil.net/ras GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)
Re: RINA - scott whaps at the nanog hornets nest :-)
On 11/6/2010 3:14 PM, George Bonser wrote: It ships with Microsoft Windows as Blackhole Router Detection and is on by default since Windows 2003 SP2. The first item returned on a blekko search is the following article which indicates that it is on by default in Windows 2008/Vista/2003/XP/2000. The article seems to predate Win7. hth, Doug -- Nothin' ever doesn't change, but nothin' changes much. -- OK Go Breadth of IT experience, and depth of knowledge in the DNS. Yours for the right price. :) http://SupersetSolutions.com/
RE: RINA - scott whaps at the nanog hornets nest :-)
The only thing this adds is trial-and-error probing mechanism per flow, to try and recover from the infinite blackholing that would occur if your ICMP is blocked in classic PMTUD. If this actually happened in any scale, it would create a performance and overhead penalty that is far worse than the original problem you're trying to solve. I ran into this very problem not long ago when attempting to reach a server for a very large network. Our Solaris hosts had no problem transacting with the server. Our linux machines did have a problem and the behavior looked like a typical PMTU black hole. It turned out that very large network tunneled the connection inside their network reducing the effective MTU of the encapsulated packets and blocked ICMP from inside their net to the outside. Changing the advertised MSS of the connection to that server to 1380 allowed it to work ( ip route add ip address via gateway dev device advmss 1380 ) and that verified that the problem was an MTU black hole. A little reading revealed why Solaris wasn't having the problem but Linux did. Setting the Linux ip_no_pmtu_disc sysctl to 1 resulted in the Linux behavior matching the Solaris behavior. Say you have two routers talking to each other over a L2 switched infrastructure (i.e. an exchange point). In order for PMTUD to function quickly and effectively, the two routers on each end MUST agree on the MTU value of the link between them. If router A thinks it is 9000, and router B thinks it is 8000, when router A comes along and tries to send a 8001 byte packet it will be silently discarded, and the only way to recover from this is with trial-and-error probing by the endpoints after they detect what they believe to be MTU blackholing. This is little more than a desperate ghetto hack designed to save the connection from complete disaster. Correct. Devices on the same vlan will need to use the same MTU. And why is that a problem? That is just as true then as it is today. Nothing changes. All you are doing is changing from everyone using 1500 to everyone using 9000 on that vlan. Nothing else changes. Why is that any kind of issue? The point where a protocol is needed is between router A and router B, so they can determine the MTU of the link, without needing to involve the humans in a manual negotiation process. When the TCP/IP connection is opened between the routers for a routing session, they should each send the other an MSS value that says how large a packet they can accept. You already have that information available. TCP provides that negotiation for directly connected machines. Again, nothing changes from the current method of operating. If I showed up at a peering switch and wanted to use 1000 byte MTU, I would probably have some problems. The point I am making is that 1500 is a relic value that hamstrings Internet performance and there is no good reason not to use 9000 byte MTU at peering points (by all participants) since it A: introduces no new problems and B: I can't find a vendor of modern gear at a peering point that doesn't support it though there may be some ancient gear at some peering points in use by some of the peers. I can not think of a problem changing from 1500 to 9000 as the standard at peering points introduces. It would also speed up the loading of the BGP routes between routers at the peering points. If Joe Blow at home with a dialup connection with an MTU of 576 is talking to a server at Y! with an MTU of 10 billion, changing a peering path from 1500 to 9000 bytes somewhere in the path is not going to change that PMTU discovery one iota. It introduces no problem whatsoever. It changes nothing. If that doesn't prove my point about every vendor having their own definition of what # is and isn't supported, I don't know what does. Also, I don't know what exchanges YOU connect to, but I very clearly see a giant pile of gear on that list that is still in use today. :) That is a list of 9000 byte clean gear. The very bottom is the stuff that doesn't support it. Of the stuff that doesn't support it, how much is connected directly to a peering point? THAT is the bottleneck I am talking about right now. One step at a time. Removing the bottleneck at the peering points is all I am talking about. That will not change PMTU issues elsewhere and those will stand just exactly as they are today without any change. In fact it will ensure that there are *fewer* PMTU discovery issues by being able to support a larger range of packets without having to fragment them. We *already* have SONET MTU of 4000 and this hasn't broken anything since the invention of SONET.
RE: RINA - scott whaps at the nanog hornets nest :-)
and that verified that the problem was an MTU black hole. A little reading revealed why Solaris wasn't having the problem but Linux did. Setting the Linux ip_no_pmtu_disc sysctl to 1 resulted in the Linux behavior matching the Solaris behavior. Oops, meant tcp_mtu_probing
Re: Emulating a cellular interface
Or Linux Netem http://www.linuxfoundation.org/collaborate/workgroups/networking/netem Suresh On Sat, Nov 6, 2010 at 6:50 AM, Andy Davidson a...@nosignal.org wrote: On 6 Nov 2010, at 05:53, Saqib Ilyas wrote: A friend of mine is doing some testing where he wishes to emulate a cellular-like interfaces with random drops and all, out of an ethernet interface. Since we have plenty of network and system ops on the list, I thought we might have luck posting the question here. Is anyone aware of a simple tool to do this, other than rather involved configuration of iptables? Not withstanding Mikael's comments that it shouldn't be lossy, at times when you want to simulate lossy (and jittery, and shaped, and ) conditions, the best way I have found to do this is FreeBSD's dummynet : http://www.freebsd.org/cgi/man.cgi?query=ipfwsektion=8#TRAFFIC_SHAPER_(DUMMYNET)_CONFIGURATION Andy
RE: RINA - scott whaps at the nanog hornets nest :-)
Re: large MTU One place where this has the potential to greatly improve performance is in transfers of large amounts of data such as vendors supporting the downloading of movies, cloud storage vendors, and movement of other large content and streaming. The *first* step in being able to realize those gains is in removing the low hanging fruit of bottlenecks in that path. The lowest hanging fruit is the peering points. Changing those should introduce no new problems as the peering points aren't currently the source of MTU path discovery problems and increasing the MTU removes a discovery issue point, only reducing the MTU would create one. In transitioning from SONET to Ethernet, we are actually reducing potential performance by reducing the effective MTU from 4000 to 2000. So even increasing bandwidth is of no use if you are potentially reducing performance end to end by reducing the effective maximum MTU of the path. In that diagram on Phil Dykstra's page linked to earlier, even though the number of packets on that OC3 backbone were mostly (by a large margin) le 1500 bytes, the majority of the TRAFFIC was carried by packets ge 1500 bytes. http://sd.wareonearth.com/~phil/pktsize_hist.gif The above graph is from a study[1] of traffic on the InternetMCI backbone in 1998. It shows the distribution of packet sizes flowing over a particular backbone OC3 link. There is clearly a wall at 1500 bytes (the ethernet limit), but there is also traffic up to the 4000 byte FDDI MTU. But here is a more surprising fact: while the number of packets larger than 1500 bytes appears small, more than 50% of the bytes were carried by such packets because of their larger size. [1] the nature of the beast: recent traffic measurements from an Internet backbone http://www.caida.org/outreach/papers/1998/Inet98/
Re: RINA - scott whaps at the nanog hornets nest :-)
On Sat, Nov 06, 2010 at 03:49:19PM -0700, George Bonser wrote: When the TCP/IP connection is opened between the routers for a routing session, they should each send the other an MSS value that says how large a packet they can accept. You already have that information available. TCP provides that negotiation for directly connected machines. You're proposing that routers should dynamically alter the interface MTU based on the TCP MSS value they receive from an EBGP neighbor? I barely know where to begin, but first off MSS is not MTU, it is only loosely related to MTU. MSS is affected by TCP options (window scale, sack, MD5 authentication, etc), and MSS between routers can be set to any value a user chooses. There is absolutely no guarantee that MSS is going to lead to a correct guess at the MTU. Also, many routers still default to having PMTUD turned off, would you suggest that they should set the physical interface MTU to 576 based on that? :) And alas, it's one hell of a layer violation too. A negotiation protocol is needed, but you could argue about where it should be for days. Maybe at the physical layer as part of auto-negotiation, maybe at the L3-L2 layer (i.e. negotiate it per IP as part of arp or neighbor discovery), hell maybe even in BGP, but keyed off MSS is way over the top. :) Again, nothing changes from the current method of operating. If I showed up at a peering switch and wanted to use 1000 byte MTU, I would probably have some problems. The point I am making is that 1500 is a relic value that hamstrings Internet performance and there is no good reason not to use 9000 byte MTU at peering points (by all participants) since it A: introduces no new problems and B: I can't find a vendor of modern gear at a peering point that doesn't support it though there may be some ancient gear at some peering points in use by some of the peers. Have you ever tried showing up to the Internet with a 1000 byte MTU? The only time that works correctly today is when you're rewriting TCP MSS values as the packet goes through the constrained link, which may be fine for the GRE tunnel to a Linux box at your house, but clearly can't work on the real Internet. I can not think of a problem changing from 1500 to 9000 as the standard at peering points introduces. It would also speed up the This suggests a serious lack of imagination on your part. :) loading of the BGP routes between routers at the peering points. If It's a very very modest increase at best. Joe Blow at home with a dialup connection with an MTU of 576 is talking to a server at Y! with an MTU of 10 billion, changing a peering path from 1500 to 9000 bytes somewhere in the path is not going to change that PMTU discovery one iota. It introduces no problem whatsoever. It changes nothing. You know one very good reason for the people on a dialup connection to have low MTUs is serialization delay. As link speeds have gotten faster but MTUs have stayed the same, one tangible benefit is the lack of a need for fair queueing to keep big packets from significantly increasing the latency of small packets. Overall I agree with the theory of larger MTUs... Improved efficiency, being able to do page-flipping with your payload, not having to worry about screwing things up if you DO need to use a tunnel or turn on IPsec, it's all well and good... But from a practical standpoint there are still a lot of very serious issues that have not been addressed, and anyone who actually tries to do this at scale is in for a world of hurt. I for one would love to see the situation improved, but trying to gloss over it and pretend the problems don't exist just delays the day when it actually CAN be supported. That is a list of 9000 byte clean gear. The very bottom is the stuff that doesn't support it. Of the stuff that doesn't support it, how much is connected directly to a peering point? THAT is the bottleneck This argument is completely destroyed at the line that says 7206VXR w/PA-GE, you don't need to read any further. I am talking about right now. One step at a time. Removing the bottleneck at the peering points is all I am talking about. That will not change PMTU issues elsewhere and those will stand just exactly as they are today without any change. In fact it will ensure that there are *fewer* PMTU discovery issues by being able to support a larger range of packets without having to fragment them. The issues I listed are precisely why it doesn't work at peering points. I know this because I do a lot of peering, and I spend a lot of time dealing with getting people to peer at larger MTU values (correctly). If it was easier to do without breaking stuff, I'd be a lot more successful at it. :) We *already* have SONET MTU of 4000 and this hasn't broken anything since the invention of SONET. SONET MTU works because it's on by default, it's the same size everywhere,
Re: RINA - scott whaps at the nanog hornets nest :-)
On Nov 6, 2010, at 10:38 AM, Mark Smith wrote: On Fri, 5 Nov 2010 21:40:30 -0400 Marshall Eubanks t...@americafree.tv wrote: On Nov 5, 2010, at 7:26 PM, Mark Smith wrote: On Fri, 5 Nov 2010 15:32:30 -0700 Scott Weeks sur...@mauigateway.com wrote: It's really quiet in here. So, for some Friday fun let me whap at the hornets nest and see what happens... ;-) http://www.ionary.com/PSOC-MovingBeyondTCP.pdf Who ever wrote that doesn't know what they're talking about. LISP is not the IETF's proposed solution (the IETF don't have one, the IRTF do), Um, I would not agree. The IRTF RRG considered and is documenting a lot of things, but did not come to any consensus as to which one should be a proposed solution. I probably got a bit keen, I've been reading through the IRTF RRG Recommendation for a Routing Architecture draft which, IIRC, makes a recommendation to pursue Identifier/Locator Network Protocol rather than LISP. That is not a consensus document - as it says To this end, this document surveys many of the proposals that were brought forward for discussion in this activity, as well as some of the subsequent analysis and the architectural recommendation of the chairs. and (Section 17) Unfortunately, the group did not reach rough consensus on a single best approach. The Chairs suggested that work continue on ILNP, but it is a stretch to characterize that as the RRG's solution, much less the IRTF's. (LISP is an IETF WG now, but with an experimental focus on its charter - The LISP WG is NOT chartered to develop the final or standard solution for solving the routing scalability problem.) Regards Marshall Regards, Mark. Regards Marshall and streaming media was seen to be one of the early applications of the Internet - these types of applications is why TCP was split out of IP, why UDP was invented, and why UDP has has a significantly different protocol number to TCP. -- NAT is your friend IP doesn’t handle addressing or multi-homing well at all The IETF’s proposed solution to the multihoming problem is called LISP, for Locator/Identifier Separation Protocol. This is already running into scaling problems, and even when it works, it has a failover time on the order of thirty seconds. TCP and IP were split the wrong way IP lacks an addressing architecture Packet switching was designed to complement, not replace, the telephone network. IP was not optimized to support streaming media, such as voice, audio broadcasting, and video; it was designed to not be the telephone network. -- And so, ...the first principle of our proposed new network architecture: Layers are recursive. I can hear the angry hornets buzzing already. :-) scott
Re: RINA - scott whaps at the nanog hornets nest :-)
* gbon...@seven.com (George Bonser) [Sun 07 Nov 2010, 00:30 CET]: Re: large MTU One place where this has the potential to greatly improve performance is in transfers of large amounts of data such as vendors supporting the downloading of movies, cloud storage vendors, and movement of other large content and streaming. The *first* step in being able to realize those gains is in removing the low hanging fruit of bottlenecks in that path. The lowest hanging fruit is the peering points. Changing those should introduce no new problems as the peering points aren't currently the source of MTU path discovery problems and increasing the MTU removes a discovery issue point, only reducing the MTU would create one. On the contrary. You're proposing to fuck around with the one place on the whole Internet that has pretty clear and well adhered-to rules and expectations about MTU size supported by participants, and basically re-live the problems from MAE-East and other shared Ethernet/FDDI platforms with mismatching MTU sizes brought us during their existence. In transitioning from SONET to Ethernet, we are actually reducing potential performance by reducing the effective MTU from 4000 to 2000. So even increasing bandwidth is of no use if you are potentially reducing performance end to end by reducing the effective maximum MTU of the path. These performance gains are minimal at best, and probably completely offset by the delays introduced by the packet loss that the probing will cause for any connection that doesn't live close to forever. I'm not even going to bother commenting on your research link from production traffic in *1998*. -- Niels. -- It's amazing what people will do to get their name on the internet, which is odd, because all you really need is a Blogspot account. -- roy edroso, alicublog.blogspot.com
RE: RINA - scott whaps at the nanog hornets nest :-)
On the contrary. You're proposing to fuck around with the one place on the whole Internet that has pretty clear and well adhered-to rules and expectations about MTU size supported by participants, and basically re-live the problems from MAE-East and other shared Ethernet/FDDI platforms with mismatching MTU sizes brought us during their existence. Ok, there is another alternative. Peering points could offer a 1500 byte vlan and a 9000 byte vlan one existing peering points and all new ones be 9000 from the start. Then there is no fucking around with anything. You show up to the new peering point, your MTU is 9000, you are done. No messing with anything. Only SHORTENING MTUs in the middle causes PMTU problems. Increasing them does not. And someone attempting to send frames larger than 1500 right now would see only a decrease in PMTU issues from such an increase in MTU at the peering points, not an increase of issues. These performance gains are minimal at best, and probably completely offset by the delays introduced by the packet loss that the probing will cause for any connection that doesn't live close to forever. Huh? You don't need to do probing. You can simply operate in passive mode. Also, even if using active probing mode, the probing stops once the MTU is discovered. In passive mode there is no probing at all unless you hit a black hole. And the performance improvements I suppose are minimal if you consider going from a maximum of 6.5Meg/sec for a transfer from LA to NY to 40Meg for the same transfer to be minimal From one of the earlier linked documents: (quote) Let's take an example: New York to Los Angeles. Round Trip Time (rtt) is about 40 msec, and let's say packet loss is 0.1% (0.001). With an MTU of 1500 bytes (MSS of 1460), TCP throughput will have an upper bound of about 6.5 Mbps! And no, that is not a window size limitation, but rather one based on TCP's ability to detect and recover from congestion (loss). With 9000 byte frames, TCP throughput could reach about 40 Mbps. Or let's look at that example in terms of packet loss rates. Same round trip time, but let's say we want to achieve a throughput of 500 Mbps (half a gigabit). To do that with 9000 byte frames, we would need a packet loss rate of no more than 1x10^-5. With 1500 byte frames, the required packet loss rate is down to 2.8x10^-7! While the jumbo frame is only 6 times larger, it allows us the same throughput in the face of 36 times more packet loss. (end quote) So if you consider 5x performance boost to be minimal yeah, I guess. Or being able to operate at todays transfer rates in the face of 36x more packet loss to be minimal improvement, I suppose.
RE: RINA - scott whaps at the nanog hornets nest :-)
So if you consider 5x performance boost to be minimal yeah, I guess. Or being able to operate at todays transfer rates in the face of 36x more packet loss to be minimal improvement, I suppose. And those improvements in performance get larger the longer the latency of the connection. For transit from US to APAC or Europe, the improvement would be even greater.
Re: Emulating a cellular interface
On Sat, Nov 06, 2010, Andy Davidson wrote: Not withstanding Mikael's comments that it shouldn't be lossy, at times when you want to simulate lossy (and jittery, and shaped, and ) conditions, the best way I have found to do this is FreeBSD's dummynet : http://www.freebsd.org/cgi/man.cgi?query=ipfwsektion=8#TRAFFIC_SHAPER_(DUMMYNET)_CONFIGURATION And cellular networks are bursty, depending upon (from what I can gather) how busy the cell is and how many people are currently doing data. Someone more cellular-oriented should drop in their 2c. So to be completely accurate, you may way to script some per-node shaping rules that watch traffic flow and adjust the rules to emulate this. I recall seeing a few apps that behaved poorly when their UDP data exchange timed out because my 3G connection was in a slow mode and didn't recover well. It required a background ICMP to keep the damned session nailed up to fast. :-) Adrian
Re: RINA - scott whaps at the nanog hornets nest :-)
On Sat, 06 Nov 2010 11:45:01 -0500 Jack Bates jba...@brightok.net wrote: On 11/5/2010 5:32 PM, Scott Weeks wrote: It's really quiet in here. So, for some Friday fun let me whap at the hornets nest and see what happens...;-) http://www.ionary.com/PSOC-MovingBeyondTCP.pdf SCTP is a great protocol. It has already been implemented in a number of stacks. With these benefits over that theory, it still hasn't become mainstream yet. People are against change. They don't want to leave v4. They don't want to leave tcp/udp. Technology advances, but people will only change when they have to. Lock of SCTP availability is nothing to do with people's avoidance of change - it's likely that deployed Linux kernels in the last 3 to 5 years already have it complied in. IPv4 NAT is what has prevented it from being deployed, because NATs don't understand it and therefore can't NAT addresses carried within it. This is one of the reasons why NAT is bad for the Internet - it has prevented deployment and/or utilisation of new transport protocols, such as SCTP or DCCP, that provide benefits over UDP or TCP. Jack (lost brain cells actually reading that pdf) Glad I haven't then, just the quotes from it hurt. Regards, Mark.
Re: RINA - scott whaps at the nanog hornets nest :-)
On 11/6/2010 7:21 PM, George Bonser wrote: (quote) Let's take an example: New York to Los Angeles. Round Trip Time (rtt) is about 40 msec, and let's say packet loss is 0.1% (0.001). With an MTU of 1500 bytes (MSS of 1460), TCP throughput will have an upper bound of about 6.5 Mbps! And no, that is not a window size limitation, but rather one based on TCP's ability to detect and recover from congestion (loss). With 9000 byte frames, TCP throughput could reach about 40 Mbps. I prefer much less packet loss in a majority of my transmissions, which in turn brings those numbers closer together. Jack
Re: Ethernet performance tests
Hi all, do you know if I will be able to use two different vendors to execute these tests ? For example, let's say that I have one JDSU unit in the side A and a EXFO unit in the side B. Will these tests work ? If not, is there a way to execute these tests having two different vendors ? Thanks ./diogo -montagner On Tue, Nov 2, 2010 at 2:13 AM, Holmes,David A dhol...@mwdh2o.com wrote: EXFO also sells the BRIX SLA verifier, which calculates RTT, packet loss, and jitter for various applications running on top of the link layer. -Original Message- From: Tim Jackson [mailto:jackson@gmail.com] Sent: Wednesday, October 27, 2010 6:54 PM To: Diogo Montagner Cc: nanog@nanog.org Subject: Re: Ethernet performance tests We dispatch a technician to an end-site and perform tests either head-head with another test set, or to a loop at a far-end.. We do ITU-T Y.156sam/EtherSAM and/or RFC2544 tests depending on the customer requirements. (some customers require certain tests for x minutes) http://www.exfo.com/en/Products/Products.aspx?Id=370 ^--All of our technicians are equipped with those EXFO sets and that module. Also covers SONET/DS1/DS3 testing as well in a single easy(er) to carry set.. -- Tim On Wed, Oct 27, 2010 at 6:32 PM, Diogo Montagner diogo.montag...@gmail.com wrote: Hello everyone, I am looking for performance test methodology for ethernet-based circuits. These ethernet circuits can be: dark-fiber, l2circuit (martini), l2vpn (kompella), vpls or ng-vpls. Sometimes, the ethernet circuit can be a mix of these technologies, like below: CPE - metro-e - l2circuit - l2vpn - l2circuit - metro-e - CPE The goal is verify the performance end-to-end. I am looking for tools that can check at least the following parameters: - loss - latency - jitter - bandwidth - out-of-order delivery At this moment I have been used IPerf to achieve these results. But I would like to check if there is some test devices that can be used in situations like that to verify the ethernet-based circuit performance. The objective of these tests is to verify the signed SLAs of each circuit before the customer start to use it. I checked all MEF specifications and I only find something related to performance for Circuit Emulation over Metro-E (which is not my case). Appreciate your comments. Thanks! ./diogo -montagner
Re: RINA - scott whaps at the nanog hornets nest :-)
On Sat, Nov 6, 2010 at 5:21 PM, George Bonser gbon...@seven.com wrote: ... (quote) Let's take an example: New York to Los Angeles. Round Trip Time (rtt) is about 40 msec, and let's say packet loss is 0.1% (0.001). With an MTU of 1500 bytes (MSS of 1460), TCP throughput will have an upper bound of about 6.5 Mbps! And no, that is not a window size limitation, but rather one based on TCP's ability to detect and recover from congestion (loss). With 9000 byte frames, TCP throughput could reach about 40 Mbps. I'd like to order a dozen of those 40ms RTT LA to NYC wavelengths, please. If you could just arrange a suitable demonstration of packet-level delivery time of 40ms from Los Angeles to New York and back, I'm sure there would be a *long* line of people behind me, checks in hand.^_^ Matt
RE: RINA - scott whaps at the nanog hornets nest :-)
I prefer much less packet loss in a majority of my transmissions, which in turn brings those numbers closer together. Jack True, though t the idea that it greatly reduces packets in flight for a given amount of data gives a lot of benefit, particularly over high latency connections. Considering throughput = ~0.7 * MSS / (rtt * sqrt(packet_loss)) (from http://sd.wareonearth.com/~phil/jumbo.html) and that packet loss to places such as China is often greater than zero, the benefits of increased PMTU become obvious. Increase that latency from 20ms to 200ms and the benefits of increased MSS are obvious. The only real argument here against changing existing peering points is all peers must have the same MTU. So far I haven't heard any real argument against it for a new peering point which is starting from a green field. It isn't going to change how anyone's network behaves internally and increasing MTU doesn't produce PMTU issues for transit traffic. It just seems a shame that two servers with FDDI interfaces using SONET long haul are going to perform much better on a coast to coast transfer than a pair with a GigE over ethernet long haul simply because of the MTU issue. Increasing the bandwidth of a path to GigE shouldn't result in reduced performance but in this case it would. At least one peering point provider has offered to create a jumbo VLAN for experimentation.
RE: RINA - scott whaps at the nanog hornets nest :-)
I'd like to order a dozen of those 40ms RTT LA to NYC wavelengths, please. If you could just arrange a suitable demonstration of packet-level delivery time of 40ms from Los Angeles to New York and back, I'm sure there would be a *long* line of people behind me, checks in hand.^_^ Matt Yeah, he must have goofed on that. The 40ms must be the one-way time, not the RTT. I get a pretty consistent 80ms to NY from California.
Re: RINA - scott whaps at the nanog hornets nest :-)
* gbon...@seven.com (George Bonser) [Sun 07 Nov 2010, 04:27 CET]: It just seems a shame that two servers with FDDI interfaces using SONET Earth to George Bonser: IT IS NOT 1998 ANYMORE. -- Niels.
Re: RINA - scott whaps at the nanog hornets nest :-)
On 11/6/2010 10:31 PM, Niels Bakker wrote: * gbon...@seven.com (George Bonser) [Sun 07 Nov 2010, 04:27 CET]: It just seems a shame that two servers with FDDI interfaces using SONET Earth to George Bonser: IT IS NOT 1998 ANYMORE. We don't fly sr71s or use bigger MTU interfaces. Get with the times! :) Jack
RE: RINA - scott whaps at the nanog hornets nest :-)
-Original Message- From: Niels Bakker [mailto:niels=na...@bakker.net] Sent: Saturday, November 06, 2010 8:32 PM To: nanog@nanog.org Subject: Re: RINA - scott whaps at the nanog hornets nest :-) * gbon...@seven.com (George Bonser) [Sun 07 Nov 2010, 04:27 CET]: It just seems a shame that two servers with FDDI interfaces using SONET Earth to George Bonser: IT IS NOT 1998 ANYMORE. Exactly my point. Why should we adopt newer technology while using configuration parameters that degrade performance? 1500 was designed for thick net. It is absolutely stupid to use it for GigE or higher speeds and I do mean absolutely idiotic. It is going backwards in performance. No wonder there is still so much transport using SONET. Using Ethernet reduces your effective performance over long distance paths.
RE: RINA - scott whaps at the nanog hornets nest :-)
* gbon...@seven.com (George Bonser) [Sun 07 Nov 2010, 04:27 CET]: It just seems a shame that two servers with FDDI interfaces using SONET Earth to George Bonser: IT IS NOT 1998 ANYMORE. Exactly my point. Why should we adopt newer technology while using configuration parameters that degrade performance? 1500 was designed for thick net. It is absolutely stupid to use it for GigE or higher speeds and I do mean absolutely idiotic. It is going backwards in performance. No wonder there is still so much transport using SONET. Using Ethernet reduces your effective performance over long distance paths. And by that I mean using 1500 MTU is what degrades the performance, not the ethernet physical transport. Using MTU 9000 would give you better performance than SONET. That is why Internet2 pushes so hard for people to use the largest possible MTU and the suggested MINIMUM is 9000.
Re: BGP support on ASA5585-X
I won't speak to the wrong solution for the wrong market but as far as large ACLs, I would agree with Tony. I've seen hundreds of different ASA configurations for a variety of customers in a variety of markets and generally once you start reaching the limits of the box you start losing sight of what your original security policies are. In almost every (not all) cases that I've seen resource exhaustion due to ACLs it's almost always gone hand in hand with security policies that aren't followed well or clear cut (i.e., overlapping security rules, lack of rule aggregation, not sure why rule X is in there, things of this nature). -Pete On Sat, Nov 6, 2010 at 9:54 AM, Tony Varriale tvarri...@comcast.net wrote: - Original Message - From: gordon b slater gordsla...@ieee.org To: Tony Varriale tvarri...@comcast.net Cc: nanog@nanog.org Sent: Saturday, November 06, 2010 4:38 AM Subject: Re: BGP support on ASA5585-X On Fri, 2010-11-05 at 21:50 -0500, Tony Varriale wrote: somebody said: They could make it out of the box but this is why Dylan made his statement. His statement is far fetched at best. Unless of course he's speaking of 100 million line ACLs. Can I just ask out of technical curiosity: Well, let me preface this thread with: the previous poster was/is from a hosting company. ASAs aren't ISP/Hosting level boxes. They are SMB to enterprise boxes. It's like saying yeah that 2501 doesn't meet our customer agg requirements at our ISP. Of course it doesn't. Wrong product wrong solution. With that said, from what I see in the field 10s of thousands. I've seen as high as 80k. But, once you get into that many ACLs, IMO there's either an ACL or security/network design problem. tv
.mil broken again?
I'm seeing DNS lookup failures for us.af.mil, usmc.mil, us.army.mil, and navy.mil. Possibly more .mil are affected. This is getting way too frequent. Anybody got a good out-of-band (not .mil) contact for reporting this? Antonio Querubin 808-545-5282 x3003 e-mail/xmpp: t...@lava.net