Re: Level(3) filtering (was Yahoo outage summary)
On Jul 9, 2007, at 8:10 PM, Chris L. Morrow wrote: In the number of customer conversations I've had about this it's always sort of surprising that people think it's 'ok' to not have a prefix-list : ( cause, guess what: "if you don't have one and they don't have one... THEY will get you eventually" Many folks seem to think that they'll be OK because 'someone else' will be doing this for them, and so they're protected. They also don't think about the fact that they themselves could accidentally cause a problem for others (and, in some cases, for themselves, by acting as an inadvertent sinkhole). But when it's explained to them that a) if everyone thinks that 'someone else' will do the appropriate filtering, then nobody will do it, and b) that they can end up hosing themselves and also taking a big reputational hit, most people I talk to about this seem to understand. The problem is that this is largely an ad-hoc, 1:1 type of educational effort, which doesn't scale well. And in many cases, folks seem to find it difficult to go to their management and explain that they must invest the opex to implement and maintain these policies (along with BCP38, iACLs, et. al.); sort of an inversion of "The Emperor's New Clothes", heh. --- Roland Dobbins <[EMAIL PROTECTED]> // 408.527.6376 voice Culture eats strategy for breakfast. -- Ford Motor Company
Re: Level(3) filtering (was Yahoo outage summary)
On Mon, 9 Jul 2007, Kevin Epperson wrote: > > There is some misinformation in previous posts that I would like to > clarify on the Level 3 side of things. > and I'd apologize for hinting that that might be the problem :( > Level 3's own registry and known public route registries. As several > folks have pointed out there are minimal checks for the validity of the > source information. this was what bit panix/edison I believe... :( > > As an aside I see an increase in the number of downstreams asking for > as-path filtering or *no* filtering usually with justifications of ISP X > doesn't require us to register routes or just does as-path filtering. In > my opinion that is bad news for everyone as documented in numerous > BCPs, presentations and route-leaks. agreed, there is this trend, it's disturbing :( (to me atleast) In the number of customer conversations I've had about this it's always sort of surprising that people think it's 'ok' to not have a prefix-list :( cause, guess what: "if you don't have one and they don't have one... THEY will get you eventually"
Level(3) filtering (was Yahoo outage summary)
There is some misinformation in previous posts that I would like to clarify on the Level 3 side of things. Every transit-like connection on AS3356 is prefix-filtered including all parties in this event. On AS3356 all prefix filters and import policies on BGP sessions are audited and checked in almost realtime for people or system errors (missing, mis-referenced, not referenced, otherwise broken config, etc.) The prefix filters themselves are generated using data from Level 3's own registry and known public route registries. As several folks have pointed out there are minimal checks for the validity of the source information. Further details on Level 3 filtering policies are available at: whois -h rr.level3.net AS3356 | grep remarks As an aside I see an increase in the number of downstreams asking for as-path filtering or *no* filtering usually with justifications of ISP X doesn't require us to register routes or just does as-path filtering. In my opinion that is bad news for everyone as documented in numerous BCPs, presentations and route-leaks. -Kevin Disclaimer - I do work for Level 3 but am expressing my opinions and not those of my employer.
RE: IP Allocations and moving AS numbers
Shane- Please redirect your email questions to ARIN ppml or discuss. That will be a better forum for you with these type of questions. I will also email you on the side. Cheers! Marla Azinger Frontier Communications AC Chair -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Behalf Of Shane Owens Sent: Monday, July 09, 2007 2:35 PM To: nanog@nanog.org Subject: IP Allocations and moving AS numbers All, I have been all but gone from IP management and BGP administration tasks for over 2 years while teaching myself telecom as a CLEC. I recently had a past business acquaintance contact me that is currently reselling bandwidth and using a 3rd parties network to do so. He currently has about 15 /24 address blocks through this 3rd party and wants to move to his own AS number and away from theirs. I know when I was last involved it seemed a pretty difficult process to do this through ARIN. Has the process changed at all recently? I am going to help them get the AS number and get the process started, but when asked if they could keep their existing IP address I explained that the existing 3rd party would need to write a letter stating that they are willing to transfer those IP's to your AS, ARIN would have to approve it and it may be a bit of a hassle. They are currently running 7 data centers nationally and are willing to migrate IP's, but would rather not if they can help it. Does this sound about right? I am going to go read the ARIN pages tonight to see if I can answer this myself, but don't have time during the workday to do a lot of research on this myself. Figure someone here probably knows already. Shane Owens DNA Communications [EMAIL PROTECTED] (w)815-562-4290 x-201 (c)815-793-3822
IP Allocations and moving AS numbers
All, I have been all but gone from IP management and BGP administration tasks for over 2 years while teaching myself telecom as a CLEC. I recently had a past business acquaintance contact me that is currently reselling bandwidth and using a 3rd parties network to do so. He currently has about 15 /24 address blocks through this 3rd party and wants to move to his own AS number and away from theirs. I know when I was last involved it seemed a pretty difficult process to do this through ARIN. Has the process changed at all recently? I am going to help them get the AS number and get the process started, but when asked if they could keep their existing IP address I explained that the existing 3rd party would need to write a letter stating that they are willing to transfer those IP's to your AS, ARIN would have to approve it and it may be a bit of a hassle. They are currently running 7 data centers nationally and are willing to migrate IP's, but would rather not if they can help it. Does this sound about right? I am going to go read the ARIN pages tonight to see if I can answer this myself, but don't have time during the workday to do a lot of research on this myself. Figure someone here probably knows already. Shane Owens DNA Communications [EMAIL PROTECTED] (w)815-562-4290 x-201 (c)815-793-3822
Re: Yahoo outage summary
On Mon, Jul 09, 2007 at 04:50:56PM -0400, Joe Abley wrote: > SIDR is only of any widespread use if it is coupled with policy/ > procedures at the RIRs to provide certificates for resources that are > assigned/allocated. However, this seems like less of a hurdle than > you'd think when you look at how many RIR staff are involved in > working on it. > So, if you consider some future world where there are suitably > machine-readable repositories of number resources (e.g. IRRs) are > combined with machine-verifiable certificates affirming a customer's > right to use them, how far out of the woods are we? Or are we going > to find out that the real problem is some fundamental unwillingness > to automate this stuff, or something else? Going to a model with reasonable and well-defined policies and procedures is a good thing. However, it renders all the existing IRR information suspect. Even the RRs run by RIRs are worthless as they stand. For instance ARIN runs an RR but does no validation of what goes in there today. A reasonable approach might be to pick up with tools based on the new SIDR work and leave the existing IRR info behind. Tony
Re: Yahoo outage summary
On Mon, Jul 09, 2007 at 04:50:56PM -0400, Joe Abley wrote: > > > On 9-Jul-2007, at 16:13, Jared Mauch wrote: > > > Some have automated systems, but they're dependent on IRR data > > being correct. There are even tools to automate population of IRR data. > > Building customer filters from the IRR seems like it should fall in the > "easy" bucket, given how long people have been doing it, and for how long. > It's the lack of a way to trust the data that's published in the IRR that > always seems to be the stumbling block. -- snip -- > So, if you consider some future world where there are suitably > machine-readable repositories of number resources (e.g. IRRs) are combined > with machine-verifiable certificates affirming a customer's right to use > them, how far out of the woods are we? Or are we going to find out that the > real problem is some fundamental unwillingness to automate this stuff, or > something else? It's that some folks feel entitled to announce routes without registering them. Take ANS vs Sprintlink as the classic example. Not much has changed since then. Nor have the tools evolved significantly. Some vendors still don't get router configuration from tools yet. Try to automate something and it's not easy or impossible. Even the best solutions on the market have some problems when you feed it a 8+Meg config. It takes a lot of cpu time to process that much. There really need to be some (ick, ignore that I suggested this) Web 2.0 IRR tools. Something that can smartly populate an IRR or IRR-like dataset. Something that can be taught to 'learn' what is reasonable. I've seen some cool things that show promise (eg: pretty good bgp), but there's always some interesting drawback. Plus, as Patrick said earlier, (and i generally agree), these types of "attacks" are rare and usually short lived. Even those like the panix situation didn't last very long. Perhaps it's not as important to think about now. - Jared -- Jared Mauch | pgp key available via finger from [EMAIL PROTECTED] clue++; | http://puck.nether.net/~jared/ My statements are only mine.
Re: Yahoo outage summary
On 9-Jul-2007, at 16:13, Jared Mauch wrote: Some have automated systems, but they're dependent on IRR data being correct. There are even tools to automate population of IRR data. Building customer filters from the IRR seems like it should fall in the "easy" bucket, given how long people have been doing it, and for how long. It's the lack of a way to trust the data that's published in the IRR that always seems to be the stumbling block. Various ops-aware people have been attacking the correctness issue in the SIDR working group. The work seems fairly well-cooked to me, and I seem to think that Geoff Huston has wrapped some proof-of-concept tools around the crypto. SIDR is only of any widespread use if it is coupled with policy/ procedures at the RIRs to provide certificates for resources that are assigned/allocated. However, this seems like less of a hurdle than you'd think when you look at how many RIR staff are involved in working on it. So, if you consider some future world where there are suitably machine-readable repositories of number resources (e.g. IRRs) are combined with machine-verifiable certificates affirming a customer's right to use them, how far out of the woods are we? Or are we going to find out that the real problem is some fundamental unwillingness to automate this stuff, or something else? Joe
Re: Yahoo outage summary
On Mon, Jul 09, 2007 at 01:23:46PM -0500, Borchers, Mark M. wrote: > Jared Mauch wrote: > > > The simple truth is that prefix lists ARE hard to manage. > > Medium-hard IMHO. Adding prefixes is relatively easy to implement. > Tracking and removing outdated information significantly more challenging. > > > Some people lack tools and automation to make it work or to manage their > networks. > > Best I can tell, even the largest transit providers handle prefix list > updates manually. Some have automated systems, but they're dependent on IRR data being correct. There are even tools to automate population of IRR data. > At this stage of history, a human interface is probably necessary in making > a reasonable > assessment about the legitimacy of an update request. I think here is one of the cruxes of the problem. If it requires a human, there's a few things that will happen: 1) prefix-list volume will be too much to be dealt with. I see some per-asn prefix lists that would be 255k routes and include all sorts of unreasonable junk like /32's 2) even taking a reasonable network, (in this case, i picked AS286) I see 4425 routes. Either you check these all manually (at least once), or come up with some way to model it. I currently see 250 routes in the table with as-path _286_ from my view. Either there's a lot of cruft there, or there's a lot of multihomed folks where i see a better path. Which is it? Do I have the time to crunch this myself? 3) What about those unique customer relationships? (this is made up) Like where ATT buys transit from Cogent for those few prefixes in New Zealand they care about? There's always some compelling business case to do something wonky. Does this mean that ATT needs to register their prefixes in the cogent IRR? How do you keep it 'quiet' that this is happening, instead of an object saying 'att priority customer route'? How do you validate these? Even the 'big guys' will make policy mistakes once in awhile. There needs to be some 'better-way' IMHO, but my ideas on this topic have not gotten far enough along for me to put code behind them. Perhaps I'll need to reprioritize those efforts. It seems to me like someone could do a cool system that churns through the route-views data, or if necessary just duplicate part of it by getting lots of bgp feeds and trying to parse the data. Too bad there's not a good way to do something like dampening on routes where depending on the age of the announcement and some 'trust' factor you can assign a series of local-preferences. I'd really like to see something like this exist. ie: "dampen" the "new" path (even if the prefix is a longer one) until some timer has ticked (unless some policy criteria are satisfied, such as same as-path, etc..). There's also the issue of how to implement this in the existing router(s), some of them with slower cpus. There's a lot of folks using older hardware to to bgp that just might melt if they had to evaluate some huge routing policy. - Jared -- Jared Mauch | pgp key available via finger from [EMAIL PROTECTED] clue++; | http://puck.nether.net/~jared/ My statements are only mine.
Re: Yahoo outage summary
On Jul 9, 2007, at 11:19 AM, jared mauch wrote: The simple truth is that prefix lists ARE hard to manage. There are a lot of folks that have complex relationships or don't see why they should register their routes. Some people lack tools and automation to make it work or to manage their networks. It would be nice to see everyone filter routes, including those from even transit and large peers. I don't think we will be able to ignore this forever. I also do not see the status quo changing soon either. I'm not sure we can't ignore it forever. The telephone network has been around for a lot longer than the 'Net, has way, way, way more connections, and there are corners of it which are managed even worse than the inter-web. Like Sean said, cost/benefit. If the cost of avoiding a 1 day outage per year is the same as a 5 day outage, management will not fix it. -- TTFN, patrick
Re: Yahoo outage summary
On Jul 9, 2007, at 9:31 AM, Randy Bush wrote: Tony Tauber wrote: There's no magic bullet in updating BGP if a fundamental, verifiable data model is not accepted and agreed upon. the space of routing data validation is large, we can explore it at our leisure, and we have been for some years. but my point was that it is silly to indulge in conjecturbation on the cause of the recent event and excoriate l(3), hanaro, or john curran's grandmother until we have heard from the folk who have actual data. I can't help but conjecturbate how this might relate to route flap damping, and whether overly aggressive RFD might related to such DoS. The other side of the coin would be that RFD might also limit the extent spoofed routes. The amount of noise within the system makes it difficult for administrators to fully comprehending what happened while it is happening. A means to even partially validate routing information might provide more timely and greater insight. This insight may help rule out nefarious causes. When it doesn't, the issue might be far more serious. Crying wolf too many times is bad, but not seeing the wolf could be worse. -Doug
Re: Yahoo outage summary
On Tue, 10 Jul 2007, Randy Bush wrote: the space of routing data validation is large, we can explore it at our leisure, and we have been for some years. but my point was that it is silly to indulge in conjecturbation on the cause of the recent event and excoriate l(3), hanaro, or john curran's grandmother until we have heard from the folk who have actual data. If companies thought it was in their self-interest, they might actually share that actual data. However history has shown over and over again that companies generally avoid any public discussion about their problems until they are overwhelmed. http://tech.monstersandcritics.com/news/article_1327791.php/Yahoo_outage_caused_by_Level3_BGP_issue If you wait for the companies to reveal the data, you will probably have a long wait. WorldCom still hasn't released its official investigative report into why its national frame networks failed for nearly a week in 1999.
Re: Yahoo outage summary
Tony Tauber wrote: > On Mon, Jul 09, 2007 at 02:31:10PM +0800, Randy Bush wrote: >>> following existing BCPs with currently-deployed >>> techniques/functionality/features would have prevented the issue >>> described in the post. >> knowing that level(3) is one of the most serious deployments of >> irr-based route filters and other prudent practices, perhaps we should >> wait for a post mortem from level(3) before jumping to conclusions? > There's no magic bullet in updating BGP if a fundamental, verifiable > data model is not accepted and agreed upon. the space of routing data validation is large, we can explore it at our leisure, and we have been for some years. but my point was that it is silly to indulge in conjecturbation on the cause of the recent event and excoriate l(3), hanaro, or john curran's grandmother until we have heard from the folk who have actual data. randy
Re: Yahoo outage summary
On Mon, Jul 09, 2007 at 02:31:10PM +0800, Randy Bush wrote: > > > following existing BCPs with currently-deployed > > techniques/functionality/features would have prevented the issue > > described in the post. > > knowing that level(3) is one of the most serious deployments of > irr-based route filters and other prudent practices, perhaps we should > wait for a post mortem from level(3) before jumping to conclusions? > > randy Level3's filter implmentation is indeed well-done, however, the fact remains that the IRR (which I use and endorse) has no linkage to any other source of information for purposes of validation. It's fundamentally garbage in, garbage out. Say some ISP has a provisioning tool which updates their router configs and the IRR in one fell swoop. If the provisioner makes a typo the IRR will gladly accept the entry for, say, 12/8, and the upstream will rebuild their filters with that entry automatically and you get the same result. There's no magic bullet in updating BGP if a fundamental, verifiable data model is not accepted and agreed upon. Tony
Re: Yahoo outage summary
* Valdis Kletnieks: > (Yes, I know the jury is still out on what really happened at L3-Hanaro. > Doesn't change the fact that we collectively shoot ourselves in the foot > because providers will believe the most implausible things from their > neighbors, like announcements for 128/1 ;) Well, if L3 creates its filters based on RADB entries (which is still considered a RR, isn't it?), they will accept a 213/8 announcement. 8-( 128/1 isn't too far away, I fear. -- Florian Weimer<[EMAIL PROTECTED]> BFK edv-consulting GmbH http://www.bfk.de/ Kriegsstraße 100 tel: +49-721-96201-1 D-76133 Karlsruhe fax: +49-721-96201-99
Re: Yahoo outage summary
On Jul 9, 2007, at 10:47 AM, [EMAIL PROTECTED] wrote: On Mon, 09 Jul 2007 02:18:25 -, "Chris L. Morrow" said: While S*BGP seem like they may offer additional protections and additional knobs to be used for protecting 'us' from 'them', the very basics are obviously not being done so added complexity is not going to really help :( Or, perhaps its not that its not going to help its just not going to get done because even prefix-lists are 'too hard', apparently. "Wow, prefix-lists are *hard*" -- BGP Barbie.. You'd think that by now, we as an industry could do better than that. I agree that we need something better but nobody has shown me a better system than prefix lists and irr that actually *works*. The simple truth is that prefix lists ARE hard to manage. There are a lot of folks that have complex relationships or don't see why they should register their routes. Some people lack tools and automation to make it work or to manage their networks. It would be nice to see everyone filter routes, including those from even transit and large peers. I don't think we will be able to ignore this forever. I also do not see the status quo changing soon either.
Re: Yahoo outage summary
On Mon, 9 Jul 2007 [EMAIL PROTECTED] wrote: > On Mon, 09 Jul 2007 02:18:25 -, "Chris L. Morrow" said: > > > While S*BGP seem like they may offer additional protections and additional > > knobs to be used for protecting 'us' from 'them', the very basics are > > obviously not being done so added complexity is not going to really help > > :( Or, perhaps its not that its not going to help its just not going to > > get done because even prefix-lists are 'too hard', apparently. > > "Wow, prefix-lists are *hard*" -- BGP Barbie.. shopping anyone? > > You'd think that by now, we as an industry could do better than that. > I think that over all, over a goodly period of time, we are... we occasionally step on the wrong end of the rake still :( > (Yes, I know the jury is still out on what really happened at L3-Hanaro. from some other conversations about this, this seems to be a similar problem to what happened to NY-Edison about 1.5/2 years ago now (panix.com route hijackage)... 'auto filter from IRR data' without some form of checking for proper authority. Of course, now that I stirred the 'l3 shoulda filtered' pot I should probably also stir the 'large ISP customers should outbound prefix-filter' pot. It's very likely that they DO filter outbound, atleast to pref routes from place to place, perhaps twin failures caught them? :( I think Marcus, Randy, Steve, Lixia all are getting at an underlying issue: "The interwebs are not as trivial to the world as they once were" So more strict control and operational due-dilligence should be on everyone's plate... Atleast for basics like making sure the routing system functions properly going forward. Anyway, should be interesting to get some more details on what happened if they are ever to become available. -Chris
Re: Yahoo outage summary
On Mon, 09 Jul 2007 02:18:25 -, "Chris L. Morrow" said: > While S*BGP seem like they may offer additional protections and additional > knobs to be used for protecting 'us' from 'them', the very basics are > obviously not being done so added complexity is not going to really help > :( Or, perhaps its not that its not going to help its just not going to > get done because even prefix-lists are 'too hard', apparently. "Wow, prefix-lists are *hard*" -- BGP Barbie.. You'd think that by now, we as an industry could do better than that. (Yes, I know the jury is still out on what really happened at L3-Hanaro. Doesn't change the fact that we collectively shoot ourselves in the foot because providers will believe the most implausible things from their neighbors, like announcements for 128/1 ;) pgpSfwDHk0JW8.pgp Description: PGP signature
Re: Vericenter Denver Outtage
Outtage with the primary sprintlink connection. Still no ETR. James Baldwin On Jul 9, 2007, at 2:09 AM, James Baldwin wrote: Does anyone have further information on the Vericenter Denver outtage? Support is aware of the issue, however, they could not feed me a problem description at the time of ticket creation or an ETR? James Baldwin
Vericenter Denver Outtage
Does anyone have further information on the Vericenter Denver outtage? Support is aware of the issue, however, they could not feed me a problem description at the time of ticket creation or an ETR? James Baldwin