do not filter your customers - part2

2012-02-27 Thread fredrik danerklint
If we are gonna start to get somewhere with this issue, how about to 
make sure the routing/prefix databases is correct first?


Please see:
https://www.fredan.se/temp/prefixes.tar

In that file you will find 'not_allowed_to_announce6' which contains
about 2307 prefixes of ipv6 which is not in any routing/prefix databases 
OR the prefix that was submitted to it was wrong (probably the syntax of 
that prefix).


Which bring us to the next question.

Why on earth is it possible to submit a faulty prefix into a database 
today? Why is there (basically) no verification at all?

Please take a look at 'databases_to_prefixes.sh' see what's going on
(ok, some of the databases is probably for internal use only and we
need to filter that - but it is so much more that needs to be filtered).

Also in that file you will find 'prefixes4' and 'prefixes6' which 
contains all the prefixes after all the checking has been made (One 
prefix per line). These two files could be really useful for everybody 
in this community if someone (like the RIR:s) made those available to 
all of us, so we don't have to download all the databases, just the 
prefixes


(And I know that AS52011 is announce to two prefixes which is not in the 
databases. Thank you very much).


--
//fredan



Re: do not filter your customers

2012-02-25 Thread Dobbins, Roland

On Feb 26, 2012, at 7:55 AM, Christopher Morrow wrote:

> I'm not sure... here's a few ideas though to toss on the fire of thought:

Concur with this general approach, which is a longer-term effort - but it would 
be nice if there was some discrete, limited-scope knob which could conceivably 
be added as a point-feature request, thereby having some chance of actually 
making it into shipping code at some point before the next millennium, and 
which won't cause more harm than good.

;>

---
Roland Dobbins  // 

  Luck is the residue of opportunity and design.

   -- John Milton




Re: do not filter your customers

2012-02-25 Thread Dobbins, Roland

On Feb 26, 2012, at 5:39 AM, Dongting Yu wrote:

>  you drop updates, which would lead to inconsistent views on the two sides of 
> the session.

Views are inconsistent by design - there is no state synchronization.  All a 
sender knows is that he sent the updates, not what (if anything) was done with 
them by the receiver.

> What if a legitimate update was among the large burst?

Presumably, soft-reset would be initiated after the throttling.

But per previous email, if any sort of throttling is to be done at all, it's 
probably best that it is done by the sender.

---
Roland Dobbins  // 

  Luck is the residue of opportunity and design.

   -- John Milton




Re: do not filter your customers

2012-02-25 Thread Christopher Morrow
On Sat, Feb 25, 2012 at 12:20 PM,   wrote:
> On Fri, 24 Feb 2012 21:39:37 EST, Christopher Morrow said:
>
>> The knobs available are sort of harsh all the way around though today :(
>
> So what would be a good knob if it was available?  I've seen about forty-leven
> people say the current knobs suck, but no real proposals of "what would really
> rock is if we could"

I'm not sure... here's a few ideas though to toss on the fire of thought:

1) break the process up inside the router, provide another set of
places to blcok and tackle the problem.

2) better metric the problem for operations staff

3) automate the problem 'better' (inside a set of sane boundaries)

I think in 1 I want to be able to be assured that inbound data to a
bgp peer will not cause problems for all other peers on the same
device. Keep the parsing, memory and cpu management separate from the
main routing management inside the router, provide controls on these
points configurable at a per-peer level.

That way you could limit things like:
  - each peer able to take a maximum amount of RAM, start discarding
routes over that limit, alarm at a configurable percentage of the
limit.
  - each peer could consume only a set percentage of CPU resources,
better would be the ability to pin bgp peer usage to a particular CPU
(or set of CPUs) and other route processing on another
CPU/set-of-CPUs.
  - interfaces between the bgp speaker, receiver, ingest and databases
could all be standardized, simple and auditable as well. If the peer
sent a malformed update only that peering session would die, if the
parsing of the update caused a meltdown again only the single peer
would be affected. The interface between the code speaking to the peer
and the RIB could be more robust and more resilient to errors.

for 2, I think having more data available about avg rate of increase,
max rate of increase, average burst size and predicted time to overrun
would be helpful. Most of this one could gather with some smart SNMP
tricks I suspect... on the other hand, just reacting to the syslog
messages in a timely fashion works :)

for 3, automate the reaction to syslog/snmp messages, increasing the
thresholds if there hasn't been an increase in the last X hours and
the limit is not above Y percent of a full table already. (and send a
note to the NOC ticket system for historical preservation).

These too have flaws... I'm not sure there's a good answer to this though :(

-chris



Re: do not filter your customers

2012-02-25 Thread Dongting Yu
Let me chime in and attempt to explain why a couple of solutions I've
seen so far in this thread won't work:

- rate-limiting/throttling updates: BGP by protocol does not repeat
updates; if an update is sent then the sender assumes that the
receiver has received it and will remember it until a change or a
withdrawal. If you rate limit announcements, either you hold things
off in a buffer, which would need a very large buffer, or you drop
updates, which would lead to inconsistent views on the two sides of
the session. What if a legitimate update was among the large burst?

- max-prefix: it is currently used to prevent large bursts of updates
but it won't stop Youtube incident, which was more targeted. Perhaps
the YT incident falls into a different category from 'route leaks' but
without a clear definition of the latter we simply cannot say. Also,
max-prefix causes problems in slowly-increasing peers or peers with
new large customers and people not bothered to adjust the max-prefix
value accordingly.

- max-prefix in the form of a percentage: some peers actually are very
stable in the number of prefixes they announce, and some are not. Both
are probably valid depending on your business model/requirements. A x%
may be too lax for one company but too little for another. Figuring
the right number (or even a ballpark) is probably a lot harder than a
simple max-prefix value. I have seen ASes that announce hundreds to
tens of thousands of prefixes on a periodic basis. Percentages also
don't work so well for ASes with single-digit or low-double-digit
number of of prefixes.


Dongting



Re: do not filter your customers

2012-02-25 Thread Nick Hilliard
On 25/02/2012 06:07, Shane Amante wrote:
>  OTOH, I would completely agree with
> Geoff's comment that the policy language of RPSL has the ability to
> express routing _policy_, a.k.a. "intent", recursively across multiple
> ASN's ... (please note that I'm specifically talking about the technical
> capability of the policy language of RPSL, not the actual _data_
> contained in the IRR).

routing policy concerns the interaction of two classes of object (prefixes
and asns) as handled between asns.  Problem is, while you can describe AS
interaction between ASNs and some prefix stuff between ASNs, rpsl doesn't
really have proper support to link the two - i.e. tying prefixes to
specific paths and all that jazz.  Then again, neither do most routers.  It
hardly matters - without a secure means of path validation, the path is
purely advisory and you can only barely trust the peer asn in the path.

So RPSL isn't really a solution for describing how prefixes ought to be
handled to inter-asn connectivity, and even if it were and routers could
handle as->prefix mapping properly, our routers couldn't handle it for
large-scale interconnection links due to configuration management
limitations.  Put simply, managing enormous lists of prefixes and piles of
ASN paths (in regex form) causes router RPs to asplode.   So from the point
of view of prefix distribution control, some sort of live query system is
required.

To this end, rpki with as path validation (if we actually had an
implementation which checked all the boxes in the draft list of
requirements) might work.  My point was that at the moment, it's vapour and
it's not clear at this point that it will ever change into something more
solid, particularly given the challenging feature list that we want it to
cope with, and given the constraints of what people already do with their
policy routing.

And even if it does ever work, it immediately opens up an exquisitely ugly
can of worms at layers 9 and above.  Call me conservative, but I have not
been convinced that RPKI solves more problems than it creates.

Your other concerns about as path validation implementation are indeed
difficult to address.

Nick



Re: do not filter your customers

2012-02-25 Thread Tom Hill

On 25/02/12 17:20, valdis.kletni...@vt.edu wrote:

On Fri, 24 Feb 2012 21:39:37 EST, Christopher Morrow said:


The knobs available are sort of harsh all the way around though today :(


So what would be a good knob if it was available?  I've seen about forty-leven
people say the current knobs suck, but no real proposals of "what would really
rock is if we could"


I've suggested before that a configured increase limit, in percentage 
might be *slightly* more intelligent than the current hard limit 
settings (i.e. max-prefixes).


Typically you're going to get, what, 100 routes? Maybe less, maybe more. 
If that rises by 100%, drop the session. Weird customer? 200%.


Tom



Re: do not filter your customers

2012-02-25 Thread Valdis . Kletnieks
On Fri, 24 Feb 2012 21:39:37 EST, Christopher Morrow said:

> The knobs available are sort of harsh all the way around though today :(

So what would be a good knob if it was available?  I've seen about forty-leven
people say the current knobs suck, but no real proposals of "what would really
rock is if we could"


pgpDRfE85mpe1.pgp
Description: PGP signature


Re: do not filter your customers

2012-02-25 Thread Randy Bush
> as would be solving world hunger, war, bad cooking, especially bad
> cooking.
> 
> route leaks, as much as i understand them
>  o are indeed bad ops issues
>  o are not security per se
>  o are a violation of business relationshiops
>  o and 20 years of fighting them have not given us any significant
>increase in understanding, formal definition, or prevention.

let me try to express how i see the problem.  to do this rigorously, i
would need to form the transitive closure of the business policies of
every inter-provider link on the internet.

why i say it is per-link and not just inter-as (which would be hard
enough) is that i know a *lot* of examples where two ass have different
business policies on different links.  [ i'll exchange se asian routes
with you in hong kong, but only sell you transit in tokyo.  we have two
links in frankfurt, one local peering and one international transit. ]

it is not just one-hop because telstra was 'supposed to' pass some
customers' customers' routes to optus.

i find this daunting.  but i would *really* like to be able to
rigorously solve it.  please please please explain to me how it is
simpler than this.

randy



Re: do not filter your customers

2012-02-25 Thread Randy Bush
> So, it is not OK for traffic to be /intentionally/ diverted through a
> malevolent AS

traffic?  i do not hold the fantasy that traffic is highly correlated to
the control plane.  see http://archive.psg.com/optometry.pdf if you need
a disproof of the fantasy.

> but it is OK for traffic to be /unintentionally/ diverted through a
> (possibly) malevolent AS?

intent?  how the hell do i know intent?  i can barely read my own mind
let alone telstra's.

and i very much doubt telstra thought they were _attacking_ optus.

randy



Re: do not filter your customers

2012-02-24 Thread Dobbins, Roland

On Feb 25, 2012, at 2:15 PM, Christopher Morrow wrote:

> if the rate is 1/ms ... I can fill the rib in 2million ms ... ~30mins?  Rate 
> alone isn't the problem :( size matters.

Sure; the idea is that some sort of throttling, coupled with overall size 
limitations, might be useful.

> People aren't trying to actively make convergence take longer, that I've seen 
> at least.

Yes, and in most cases, the goal is to speed up convergence.  I'm positing that 
in these particular circumstances, fast convergence is not necessarily 
desirable, and that 'these particular circumstances' generally involve large 
numbers of updates which are not associated with turning up a new peering 
session being received over a short period of time.

What about routing update transmission throttling, instead?  Does that make any 
more sense, in terms of being liberal with what we accept and conservative in 
what (or how much, how quickly) we send?

> dropping a single customer sucks, dropping an entire edge device is far far 
> worse.

I agree; I don't mean to imply that anything should be dropped.  Again, 
apologies for being unclear.

---
Roland Dobbins  // 

  Luck is the residue of opportunity and design.

   -- John Milton




Re: do not filter your customers

2012-02-24 Thread Christopher Morrow
On Fri, Feb 24, 2012 at 10:52 PM, Dobbins, Roland  wrote:
>
>> X prefixes/packets in Y seconds/milliseconds doesn't keep the peer from 
>> blowing up your RIB,
>
> How so?  If the configured parameters are exceeded, stop accepting/inserting 
> updates until this is no longer the case.  Exceptions would be made for 
> peering session establishment, it would take effect after that.
>

if the rate is 1/ms ... I can fill the rib in 2million ms ... ~30mins?
Rate alone isn't the problem :( size matters.

>> it does slow down convergence :(
>
> Yes, but is this always necessarily a Bad Thing?  For example, this 
> particular circumstance (and many like it, c.f. AS7007 incident, et. al.)  it 
> could be argued that in this particular case, [incorrect?  undesirable?  
> premature? pessimal?] convergence led to a poor result, could it not?
>

it's not clear, to me at least, that slowing convergence is good. it
seems to me that folk do all manner of 'interesting' things in order
to limit convergence time. People aren't trying to actively make
convergence take longer, that I've seen at least.

>> If you have 200 peers on an edge device, dropping the whole device's routing 
>> capabilities because of one AS7007/AS1221/AS9121 .. isn't cool
>> to your network nor the other customers on that device :(
>
> Apologies for being unclear; I wasn't suggesting dropping or removing 
> anything, but rather refusing to further accept/insert updates from a given 
> peer until the update rate from said peer slowed to within configured 
> parameters.
>

yup, I think I jumped a bit around, my penalizing every other customer
was a reference to not having any limiting system in place.

>> max-prefix as it exists today at least caps the damage at one customer.
>
> But it doesn't, really, does it?  The effects cascade in an anisotropic 
> manner throughout a potentially large transit cone.
>

dropping a single customer sucks, dropping an entire edge device is
far far worse.

>> The knobs available are sort of harsh all the way around though today :(
>
> Concur again, sigh.

hurray! sort of.

thanks!
-chris



Re: do not filter your customers

2012-02-24 Thread Shane Amante

On Feb 24, 2012, at 5:49 PM, Randy Bush wrote:
>> Solving for route leaks is /the/ "killer app" for BGPSEC.
> 
> as would be solving world hunger, war, bad cooking, especially bad
> cooking.
> 
> route leaks, as much as i understand them
>  o are indeed bad ops issues
>  o are not security per se
>  o are a violation of business relationshiops
>  o and 20 years of fighting them have not given us any significant
>increase in understanding, formal definition, or prevention.
> 
> i would love to see progress on the route leak problem.  i do not
> confuddle it with security.


So, it is not OK for traffic to be /intentionally/ diverted through a 
malevolent AS, but it is OK for traffic to be /unintentionally/ diverted 
through a (possibly) malevolent AS?  Who's to judge the security exposure[1] of 
the latter is not identical (or, worse) than the former?

-shane

[1] dropped traffic, traffic analysis, etc. 


Re: do not filter your customers

2012-02-24 Thread Shane Amante
Nick,

On Feb 24, 2012, at 4:16 PM, Nick Hilliard wrote:
> On 24/02/2012 20:04, Shane Amante wrote:
>> Solving for route leaks is /the/ "killer app" for BGPSEC.  I can't
>> understand why people keep ignoring this.
> 
> I'd be interested to hear your opinions on exactly how rpki in its current
> implementation would have prevented the optus/telstra problem.  Could you
> elaborate?

I apologize if I mislead you, but I did not claim that the RPKI, in its current 
ROA implementation, *would* have prevented this specific route leak related to 
Optus/Telstra.  OTOH, I would completely agree with Geoff's comment that the 
policy language of RPSL has the ability to express routing _policy_, a.k.a. 
"intent", recursively across multiple ASN's ... (please note that I'm 
specifically talking about the technical capability of the policy language of 
RPSL, not the actual _data_ contained in the IRR).

Or, to put it a different way, the reachability information carried in BGP is 
the end-result/output of policy.  One needs to understand the *input*, a.k.a.: 
the policy/intent, if they are to validate the output, namely the reachability 
information carried in BGP.  Unfortunately, denying this reality is not going 
to make it "go away".


> Here's a quote from draft-ietf-sidr-origin-ops:
> 
>>   As the BGP origin AS of an update is not signed, origin validation is
>>   open to malicious spoofing.  Therefore, RPKI-based origin validation
>>   is designed to deal only with inadvertent mis-advertisement.
>> 
>>   Origin validation does not address the problem of AS-Path validation.
>>   Therefore paths are open to manipulation, either malicious or
>>   accidental.
> 
> An optus/telstra style problem might have been mitigated by an rpki based
> full path validation mechanism, but we don't have path validation.  Right
> now, we only have a draft of a list of must-have features -
> draft-ietf-sidr-bgpsec-reqs.  This is only the first step towards designing
> a functional protocol, not to mind having running code.

As one example, those "must-have features" have not, yet[1], accounted for the 
various "kinky" things we all do to manipulate the AS_PATH in the wild, for 
lots of very important business reasons, namely: ASN consolidation through 
knobs like "local-as alias" in JUNOS-land and "local-as no-prepend replace-as" 
in IOS-land, which have existed in shipping code for several years and are in 
active, widespread use and will continue to remain so[2].  Furthermore, given 
the current design proposal on the table of a BGPSEC transmitter 
forward-signing the "Target AS", as learned from a receiver in the BGP OPEN 
message, this could make it impossible to do ASN consolidation in the future, 
(unless I'm misunderstanding something).

-shane

[1] I have asked at the the last SIDR WG meeting in Taipei specifically for 
this to be accounted for, but I don't see this in the current rev of the draft 
you cite. Perhaps others should chime in on the SIDR WG mailing list if they 
are aware of the use of ASN-consolidation knobs and consider them a critical 
factor to consider during the design process, particularly so they are looked 
at during the earliest stages of the design.
[2] I haven't heard of any vendors stating that they are intending to EOL or 
not support those features any more, but it would be amusing to see the 
reaction they would get if they tried.  :-)


Re: do not filter your customers

2012-02-24 Thread Jeff Young

On 25/02/2012, at 12:59 PM, Christopher Morrow wrote:

> On Fri, Feb 24, 2012 at 8:24 PM, Jeffrey S. Young  wrote:
>> 1.  Make your customers register routes, then filter them.
>> (may be time for big providers to put routing tools into
>> open source for the good of the community - make it
>> less hard?)
> 
> not a big provider, but ras@e-gerbil did release irr-tools no?

And other providers out there have extensive tool sets from which
we could all benefit.  I'll let them chime in if they choose.

> 
>> 2.  Implement the "1-hop" hack to protect your BGP peering.
>> 
>> 98% of problem solved on the Internet today
>> 
> 
> which problem? GTSH only protects your actual bgp session, not the
> content of the session(s) or the content across the larger network.
> 

The security problem, but it was a hedge on my part.

>> 3.  Implement a "# of routes-type" filter to make your peers
>> (and transit customers) phone you if they really do want
>> to add 500,000 routes to your session ( or the wrong set
>> of YouTube routes...).
> 
> max-prefix already exists... sometimes it works, sometimes it's a
> burden. It doesnt' tell you anything about the content of the session
> though (the YT routes example doesn't actually work that way)

Depends on how many /24's the Pakistan(?) Telecom guy let into the 
network to block the YT content...  but you're right, the example would 
have been better in support of #1.  
(had PT been forced to register routes before sending them and his 
upstream been filtering based on those routes we'd have never heard 
about it.)

> 
>> 99.9% of problem solved.
> 
> ? not sure about that number
> 

>> 4.  Implement BGP-Sec
>> 
>> 99.91% of "this" problem solved.
>> 
>> Because #1 is 'just too hard' and because #4 is just too sexy
>> as an academic pursuit we all suffer the consequences.  It's
> 
> there are folks working on the #4 problem, not academics even. It's
> not been particularly sexy though :(
> 

Point was that the problem is mostly operational.  We have tools
to deal with the problem but the operational costs are high.  For 
fifteen (below) years we've treated this (route leak) as "not my problem"
because it's too costly.   Every 6-12 months it comes back to bite
us.  If the cost of an outage every 6 months+ is low compared to
solving the problem, the community will endure the outage. If we 
want it to stop today we can make it stop but stopping it has a cost.

“...a glitch at a small ISP... triggered a major outage in Internet access 
across the country. The problem started when MAI Network Services
...passed bad router information from one of its customers onto Sprint.”
-- news.com, April 25, 1997

jy



PGP.sig
Description: This is a digitally signed message part


Re: do not filter your customers

2012-02-24 Thread Dobbins, Roland

On Feb 25, 2012, at 9:39 AM, Christopher Morrow wrote:

> it seems to me that most of the options discussed for this are .. bad, in one 
> dimension or another :(

Concur.

> X prefixes/packets in Y seconds/milliseconds doesn't keep the peer from 
> blowing up your RIB,

How so?  If the configured parameters are exceeded, stop accepting/inserting 
updates until this is no longer the case.  Exceptions would be made for peering 
session establishment, it would take effect after that.

> it does slow down convergence :(

Yes, but is this always necessarily a Bad Thing?  For example, this particular 
circumstance (and many like it, c.f. AS7007 incident, et. al.)  it could be 
argued that in this particular case, [incorrect?  undesirable?  premature? 
pessimal?] convergence led to a poor result, could it not?

> If you have 200 peers on an edge device, dropping the whole device's routing 
> capabilities because of one AS7007/AS1221/AS9121 .. isn't cool
> to your network nor the other customers on that device :(

Apologies for being unclear; I wasn't suggesting dropping or removing anything, 
but rather refusing to further accept/insert updates from a given peer until 
the update rate from said peer slowed to within configured parameters.

> max-prefix as it exists today at least caps the damage at one customer.

But it doesn't, really, does it?  The effects cascade in an anisotropic manner 
throughout a potentially large transit cone.

> The knobs available are sort of harsh all the way around though today :(

Concur again, sigh.

---
Roland Dobbins  // 

  Luck is the residue of opportunity and design.

   -- John Milton




Re: do not filter your customers

2012-02-24 Thread Christopher Morrow
On Fri, Feb 24, 2012 at 9:12 PM, Dobbins, Roland  wrote:
>
> On Feb 25, 2012, at 8:59 AM, Christopher Morrow wrote:
>
>> max-prefix already exists... sometimes it works, sometimes it's a burden.
>
> Some sort of throttle - i.e., allow only X number of routing updates within Y 
> number of [seconds?  milliseconds? BGP packets?] would be more useful, IMHO.  
> If the configured rate is exceeded, maintain the session but stop accepting 
> further updates until either manually reset or the rate of updates falls back 
> within acceptable parameters.

it seems to me that most of the options discussed for this are .. bad,
in one dimension or another :(

typical max-prefix today will dump a session, if you exceed the number
of prefixes on the session... good? maybe? bad? maybe? did the peer
fire up a full table to you? or did you just not pay attention to the
log messages saying: "Hey, joe's going to need an update shortly..."

X prefixes/packets in Y seconds/milliseconds doesn't keep the peer
from blowing up your RIB, it does slow down convergence :(

If you have 200 peers on an edge device, dropping the whole device's
routing capabilities because of one AS7007/AS1221/AS9121 .. isn't cool
to your network nor the other customers on that device :( max-prefix
as it exists today at least caps the damage at one customer.

The knobs available are sort of harsh all the way around though today :(

-chris



Re: do not filter your customers

2012-02-24 Thread Julien Goodwin
On 25/02/12 13:12, Dobbins, Roland wrote:
> 
> On Feb 25, 2012, at 8:59 AM, Christopher Morrow wrote:
> 
>> max-prefix already exists... sometimes it works, sometimes it's a burden.
> 
> Some sort of throttle - i.e., allow only X number of routing updates within Y 
> number of [seconds?  milliseconds? BGP packets?] would be more useful, IMHO.  
> If the configured rate is exceeded, maintain the session but stop accepting 
> further updates until either manually reset or the rate of updates falls back 
> within acceptable parameters.


JunOS does have "out-delay", but that's not quite a solution although it
does help stem some prefix flapping issues.



Re: do not filter your customers

2012-02-24 Thread Dobbins, Roland

On Feb 25, 2012, at 8:59 AM, Christopher Morrow wrote:

> max-prefix already exists... sometimes it works, sometimes it's a burden.

Some sort of throttle - i.e., allow only X number of routing updates within Y 
number of [seconds?  milliseconds? BGP packets?] would be more useful, IMHO.  
If the configured rate is exceeded, maintain the session but stop accepting 
further updates until either manually reset or the rate of updates falls back 
within acceptable parameters.

---
Roland Dobbins  // 

  Luck is the residue of opportunity and design.

   -- John Milton




Re: do not filter your customers

2012-02-24 Thread Christopher Morrow
On Fri, Feb 24, 2012 at 8:24 PM, Jeffrey S. Young  wrote:
> 1.  Make your customers register routes, then filter them.
>     (may be time for big providers to put routing tools into
>     open source for the good of the community - make it
>     less hard?)

not a big provider, but ras@e-gerbil did release irr-tools no?

> 2.  Implement the "1-hop" hack to protect your BGP peering.
>
> 98% of problem solved on the Internet today
>

which problem? GTSH only protects your actual bgp session, not the
content of the session(s) or the content across the larger network.

> 3.  Implement a "# of routes-type" filter to make your peers
>     (and transit customers) phone you if they really do want
>     to add 500,000 routes to your session ( or the wrong set
>     of YouTube routes...).

max-prefix already exists... sometimes it works, sometimes it's a
burden. It doesnt' tell you anything about the content of the session
though (the YT routes example doesn't actually work that way)

> 99.9% of problem solved.

? not sure about that number

> 4.  Implement BGP-Sec
>
> 99.91% of "this" problem solved.
>
> Because #1 is 'just too hard' and because #4 is just too sexy
> as an academic pursuit we all suffer the consequences.  It's

there are folks working on the #4 problem, not academics even. It's
not been particularly sexy though :(

> a shame that tier one peering agreements didn't evolve with
> a 'filter your customers' clause (aka do the right thing) as well
> as a 'like for like' (similar investments) clause in them.

I'm missing something here... it's not clear to me that 'tier1'
providers matter a whole lot in the discussion. Many of them have
spoken up saying: "Figuring out the downstream matrix in order to put
a prefix-list on my SFP peer is not trivial, and probably not workable
on gear today." (shane I think has even said this here...)

> I'm not downplaying the BGP-SEC work, I think it's valid and
> may one day save us from some smart bunny who wants to
> make a name for himself by bringing the Internet to a halt.  I
> don't believe that's what we're battling here.  We're battling the
> operational cost of doing the right thing with the toolset we have

right, so today you have to do a lot of math/work to figure out if
your customer's prefixes are hers, and if they should be permitted
into your RIB. Tomorrow you COULD get a better end result with less
work and more assurance given a populated resource certification
system.

Extending some into the land of BGPSEC you COULD also know that the
route you hear originated from the correct ASN and later you'd be able
to tell that path the route travel was the same as the ASPATH in the
route...

-chris



Re: do not filter your customers

2012-02-24 Thread Dobbins, Roland

On Feb 25, 2012, at 7:49 AM, Randy Bush wrote:

> i would love to see progress on the route leak problem.  i do not confuddle 
> it with security.

Availability is a key aspect of security - the most important one, in many 
cases/contexts.  The availability of the control plane itself (i.e., being 
stable/resilient enough to continue doing its job even under various forms of 
duress) as well as the availability of the information about paths it 
propagates in order to allow the routing of transit traffic both fall squarely 
within the rubric of security, IMHO.

The disruption of transit traffic routing often caused by route leaks, as in 
this particular case, has a negative impact of the overall availability of 
affected networks/endpoints/applications/services/data.  However, route leaks 
are only one potential cause of such hits to availability - and while there are 
several BCPs which can and should be adopted in order to protect against 
control-plane disruption, they in many cases honored more in the breach than in 
the observance due to complexity, opex (as is the case with many - some would 
say most - security-related BCPs), and so forth.

The single best thing which could be done to improve the stability/resiliency 
of the control-plane on IP networks in general would be to change the nature of 
the control-plane (not just BGP, but the IGPs, as well) from in-band to 
out-of-band, IMHO.  I know this will probably never happen, but wanted to be 
sure that the point was made in relation to this specific topic for the sake of 
completeness, if nothing else.

---
Roland Dobbins  // 

  Luck is the residue of opportunity and design.

   -- John Milton




Re: do not filter your customers

2012-02-24 Thread Jeffrey S. Young
1.  Make your customers register routes, then filter them.
 (may be time for big providers to put routing tools into 
 open source for the good of the community - make it 
 less hard?)

2.  Implement the "1-hop" hack to protect your BGP peering.

98% of problem solved on the Internet today

3.  Implement a "# of routes-type" filter to make your peers 
 (and transit customers) phone you if they really do want 
 to add 500,000 routes to your session ( or the wrong set
 of YouTube routes...).

99.9% of problem solved.

4.  Implement BGP-Sec

99.91% of "this" problem solved.

Because #1 is 'just too hard' and because #4 is just too sexy 
as an academic pursuit we all suffer the consequences.  It's
a shame that tier one peering agreements didn't evolve with
a 'filter your customers' clause (aka do the right thing) as well
as a 'like for like' (similar investments) clause in them.

I'm not downplaying the BGP-SEC work, I think it's valid and
may one day save us from some smart bunny who wants to
make a name for himself by bringing the Internet to a halt.  I
don't believe that's what we're battling here.  We're battling the
operational cost of doing the right thing with the toolset we have
versus waiting for a utopian solution (foolproof and free) that may 
never come.

jy

ps. my personal view.

On 25/02/2012, at 6:26 AM, Danny McPherson  wrote:

> 
> On Feb 24, 2012, at 1:10 PM, Steven Bellovin wrote:
> 
>> But just because we can't solve the whole problem, does that
>> mean we shouldn't solve any of it?
> 
> Nope, we most certainly should decompose the problem into 
> addressable elements, that's core to engineering and operations.
> 
> However, simply because the currently envisaged solution 
> doesn't solve this problem doesn't mean we shouldn't 
> acknowledge it exists.
> 
> The IETF's BGP security threats document [1]  "describes a threat 
> model for BGP path security", which constrains itself to the 
> carefully worded SIDR WG charter, which addresses route origin 
> authorization and AS_PATH "semantics" -- i.e., this "leak" 
> problem is expressly out of scope of a threats document
> discussing BGP path security - eh? 
> 
> How the heck we can talk about BGP path security and not 
> consider this incident a threat is beyond me, particularly when it 
> happens by accident all the time.  How we can justify putting all 
> that BGPSEC and RPKI machinery in place and not address this 
> "leak" issue somewhere in the mix is, err.., telling.
> 
> Alas, I suspect we can all agree that experiments are good and 
> the market will ultimately decide.
> 
> -danny
> 
> [1] draft-ietf-sidr-bgpsec-threats-02
> 



Re: do not filter your customers

2012-02-24 Thread Randy Bush
> Solving for route leaks is /the/ "killer app" for BGPSEC.

as would be solving world hunger, war, bad cooking, especially bad
cooking.

route leaks, as much as i understand them
  o are indeed bad ops issues
  o are not security per se
  o are a violation of business relationshiops
  o and 20 years of fighting them have not given us any significant
increase in understanding, formal definition, or prevention.

i would love to see progress on the route leak problem.  i do not
confuddle it with security.

randy



Re: do not filter your customers

2012-02-24 Thread Randy Bush
>> I'm optimistic that all the good folks focusing on this in their day
>> jobs, and expressly funded and resourced to do so, will eventually
>> recognize what I'm calling "leaks" is part of the routing security 
>> problem.
>> 
> Sure; I don't disagree, and I don't think that Randy does.  But just
> because we can't solve the whole problem, does that mean we shouldn't
> solve any of it?

is it a *security* problem?  it is a violation of business intent.  and
one we would like to solve.  but it is not clear to me that 'leaks' are
really a security issue.

randy



Re: do not filter your customers

2012-02-24 Thread Nick Hilliard
On 24/02/2012 20:59, Leo Bicknell wrote:
> It turns out the real world is quite messy though, often full of
> temporary hacks, unusual relationships and other issues.

... and, if you create a top-down control mechanism to be superimposed upon
the current fully distributed control mechanism, you will soon find that
politicians and regulators will take a very keen interest in BGP once they
realise that they can turn off specific prefixes from a single point.

Whatever about temporary hacks and unusual relationships, the entropy
introduced by layers 9 through 12 is almost always insufferable.

Nick




Re: do not filter your customers

2012-02-24 Thread Nick Hilliard
On 24/02/2012 20:04, Shane Amante wrote:
> Solving for route leaks is /the/ "killer app" for BGPSEC.  I can't
> understand why people keep ignoring this.

I'd be interested to hear your opinions on exactly how rpki in its current
implementation would have prevented the optus/telstra problem.  Could you
elaborate?

Here's a quote from draft-ietf-sidr-origin-ops:

>As the BGP origin AS of an update is not signed, origin validation is
>open to malicious spoofing.  Therefore, RPKI-based origin validation
>is designed to deal only with inadvertent mis-advertisement.
> 
>Origin validation does not address the problem of AS-Path validation.
>Therefore paths are open to manipulation, either malicious or
>accidental.

An optus/telstra style problem might have been mitigated by an rpki based
full path validation mechanism, but we don't have path validation.  Right
now, we only have a draft of a list of must-have features -
draft-ietf-sidr-bgpsec-reqs.  This is only the first step towards designing
a functional protocol, not to mind having running code.

Nick



Re: do not filter your customers

2012-02-24 Thread Geoff Huston

On 25/02/2012, at 7:54 AM, Christopher Morrow wrote:

> On Fri, Feb 24, 2012 at 3:04 PM, Shane Amante  wrote:
> 
>> Solving for route leaks is /the/ "killer app" for BGPSEC.  I can't 
>> understand why people keep ignoring this.
> 
> I don't think anyone's ignoring the problem... I think lots of people
> have said an equivalent of:
> 1) "How do I know that this path: A - B - C - D
>  is a 'leak'?"
> 

If you are receiving  a path of the form (A B C D), and the origination of the 
prefix at D is good, then the only way you can figure out this is a leak as 
compare to the intentional operation of BGP is not by looking at the operation 
of protocol per se, but by looking at the routing policy intentions of A, B, C 
and D and working out if what you are seeing is intentional within the scope of 
the routing policies of these entities. RPSL is one such approach of describing 
such policy in a manner that one could perform some basic computation over the 
data.

It exposes a broader issue here about the difference between routing intent and 
protocol correctness. From the perspective of protocol correctness, regardless 
of whether the information was intended to be propagated, a protocol 
correctness tool should be able to tell you that the information has been 
faithfully propagated, but cannot tell you whether such propagation was 
intentional or not.


> Followed by:
> 2) "Tell me how to answer this programatically given the data we have
> today in the routing system" (bgp data on the wire, IRR data, RIR
> data)
> 

I wish.

> so far ... both of the above questions haven't been answered (well 1
> was answered with: "I will know it when i see it" which isn't helpful
> at all in finding a solution)


Some longstanding problems are longstanding because we have not quite managed 
to apply the appropriate analytical approach to the problem. Others are 
longstanding problems because they are damn difficult and this makes me wonder 
if we really understand the nature of the space we are working in. For example, 
if you think about routing not as a topology and reachability tool, but an 
distributed algorithm to solve a set of simultaneous equations (policies) would 
that provide a different insight as to the way in which routing policies and 
routing protocols interact? 

Geoff








RE: do not filter your customers

2012-02-24 Thread George Bonser
> -Original Message-
> From: Leo Bicknell 
> Sent: Friday, February 24, 2012 1:00 PM

> There are plenty of cases where someone "leaks" more specifics with
> NO_EXPORT to only one of their BGP peers for the purposes of TE.
> 
> The challenge of securing BGP isn't crypto, and it isn't enough
> ram/cpu/whatever to process it.  The challenge is getting a crypto
> scheme that operators can use to easily represent the real world.
> It turns out the real world is quite messy though, often full of
> temporary hacks, unusual relationships and other issues.
> 
> I'm sure it will be solved, one day.

I can think of a way to do it but it would require some trust and it would 
require that people actually *used* it.  What one would do is feed the routes 
they are proposing to send to a BGP peer to a RIR front-end.  The receiving 
peer would "sign off" on the proposal and the routes would be then entered into 
the RIR.  That is the step that is currently missing.  Anyone can enter 
practically anything into an RIR and the receiving side never gets to "sanity 
check" the information before it actually gets written to the database.  Once 
you have this base of information, route filtration generated from the database 
becomes more reliable.

In fact, a network might have several "canned" profiles of different route 
packages registered in the front end. A "transit" package, a "customer routes" 
package and maybe some specialized packages for peering at various 
private/public exchange points.  If you pick up a new peer at a transit point, 
you select the package for that point, it proposes that to the peer, peer 
approves it, and they can both generate their route filters from that 
information.

It could even highlight some glaring errors automatically to spot what might be 
a typo or even attempted nefarious activity.  The receiver of a proposed change 
might be alerted to the fact that the new route(s) being offered are 
inconsistent with the database information (routes already being sourced by an 
AS that the proposed sender is not peering with) which could be overridden by 
the receiver (or just ignored) but having something show up in some way that 
highlights a possible inconsistency might generate a closer look at that 
proposal and head off problems later.

But the fundamental problem is that the current system is "open loop".




Re: do not filter your customers

2012-02-24 Thread Steven Bellovin

On Feb 24, 2012, at 2:26 14PM, Danny McPherson wrote:

> 
> On Feb 24, 2012, at 1:10 PM, Steven Bellovin wrote:
> 
>> But just because we can't solve the whole problem, does that
>> mean we shouldn't solve any of it?
> 
> Nope, we most certainly should decompose the problem into 
> addressable elements, that's core to engineering and operations.
> 
> However, simply because the currently envisaged solution 
> doesn't solve this problem doesn't mean we shouldn't 
> acknowledge it exists.
> 
> The IETF's BGP security threats document [1]  "describes a threat 
> model for BGP path security", which constrains itself to the 
> carefully worded SIDR WG charter, which addresses route origin 
> authorization and AS_PATH "semantics" -- i.e., this "leak" 
> problem is expressly out of scope of a threats document
> discussing BGP path security - eh? 
> 
> How the heck we can talk about BGP path security and not 
> consider this incident a threat is beyond me, particularly when it 
> happens by accident all the time.  How we can justify putting all 
> that BGPSEC and RPKI machinery in place and not address this 
> "leak" issue somewhere in the mix is, err.., telling.


I repeat -- we're in violent agreement that route leaks are
a serious problem.  No one involved in BGPSEC -- not me, not Randy,
not anyone -- disagrees.  Give us an actionable definition and
we'll try to build a defense.  Right now, we have nothing better
than what Justice Potter Stewart once said in an opinion: "I shall 
not today attempt further to define the kinds of material I 
understand to be embraced within that shorthand description 
["hard-core pornography"]; and perhaps I could never succeed 
in intelligibly doing so. But I know it when I see it..."

Again -- *please* give us a definition.

--Steve Bellovin, https://www.cs.columbia.edu/~smb

P.S. It was routing problems, including leaks between RIP and either
EIGRP or OSPF (it's been >20 years; I just don't remember), that got
me involved in Internet security in the first place.  I really do
understand the issue.




Re: do not filter your customers

2012-02-24 Thread Christopher Morrow
On Fri, Feb 24, 2012 at 4:29 PM, Leo Bicknell  wrote:
> In a message written on Fri, Feb 24, 2012 at 04:07:28PM -0500, Christopher 
> Morrow wrote:
>> well for bgpsec so if the paths were signed, and origins signed,
>> why would they NOT pass BGPSEC muster?
>
> I honestly have trouble keeping the BGP security work straight.

yes

> There is work to secure the sessions, work to authenticate route
> origin, work to authenticate the AS-Path, the peer relationships,
> and so on.
>
> I believe BGPSEC authenticates the AS-Path, and thus turning up a
> new peer requires them to each sign each others "path object".

well currently it doesn't do anything (really) but the PLAN is that
you'd be able to look at the origin, view some transitive
community/attribute and say: "That validates with the roa data" - some
cert-check/hash-check/etc.

then later on you'd be able to say for each AS in the ASPATH:
  "Yes, the route is signed by AS1, the signature validates. Yes the
route is signed by AS2, the signature validates (wash/rinse/repeat for
the whole path)"

> During the time period between when the route propogates and the
> signature propogates these routes appear to be a leak.  I don't

signatures follow inside the announcement as currently draft-spec'd.

> believe the signature data is moved via BGP.  Worse, in this case,
> imagine if one of the parties was "cut off" from the signature
> distribution system.  They would need to bring up their (non-validating)
> routes to reach the signature distribution system before their
> routes would be accepted!

the sig data for an NLRI follows along inside the announcement.
the cache of data is probably updated inside of a day... there's
likely some skew, but provided the origins don't change and no one has
to emergency release new key materials, I think it's not important for
this discussion.

you simply start hearing routes with same origin as previously on
different paths. "new customers" essentially pop up en-mass. This
isn't a problem as long as the customers are the same origin-as as
before... it'd mean some rejiggering of prefix-lists (as I said
before) but ... you'd be doing that anyway.

> In fact, this happens today with those who strict IRR filter.  Try
> getting a block from ARIN, and then service from a provider who
> only uses IRR filters.  The answer is to go to some other already
> up and working network to submit your IRR data to the IRR server,
> before your network can come up and be accepted!

right, there's some lag between publication and acceptance/update. I
think in the case of (for example L(3) the lag is ~6hrs in the worst
case.

> On a new turn up for an end-user, not a big deal.  When you look at the
> problems that might occur in the face of natural or man made disasters
> though, like the cable cut, it could result in outages that could have
> been fixed in minutes with a non-validing system taking hours to fix in
> a validating one.

I don't think that's really the case, but walking through the
processes/requirements seems like a sane thing to do.

> That may be an acceptable trade off to get security; but it depends on
> exactly what the trade off ends up being.  To date, I personally have
> found "insecure" BGP, even with the occasional leaks, to be a better
> overall solution.

how's that chinese leak of F-root doing for you? :)

-chris



Re: do not filter your customers

2012-02-24 Thread Leo Bicknell
In a message written on Fri, Feb 24, 2012 at 04:07:28PM -0500, Christopher 
Morrow wrote:
> well for bgpsec so if the paths were signed, and origins signed,
> why would they NOT pass BGPSEC muster?

I honestly have trouble keeping the BGP security work straight.
There is work to secure the sessions, work to authenticate route
origin, work to authenticate the AS-Path, the peer relationships,
and so on.

I believe BGPSEC authenticates the AS-Path, and thus turning up a
new peer requires them to each sign each others "path object".

During the time period between when the route propogates and the
signature propogates these routes appear to be a leak.  I don't
believe the signature data is moved via BGP.  Worse, in this case,
imagine if one of the parties was "cut off" from the signature
distribution system.  They would need to bring up their (non-validating)
routes to reach the signature distribution system before their
routes would be accepted!

In fact, this happens today with those who strict IRR filter.  Try
getting a block from ARIN, and then service from a provider who
only uses IRR filters.  The answer is to go to some other already
up and working network to submit your IRR data to the IRR server,
before your network can come up and be accepted!

On a new turn up for an end-user, not a big deal.  When you look at the
problems that might occur in the face of natural or man made disasters
though, like the cable cut, it could result in outages that could have
been fixed in minutes with a non-validing system taking hours to fix in
a validating one.

That may be an acceptable trade off to get security; but it depends on
exactly what the trade off ends up being.  To date, I personally have
found "insecure" BGP, even with the occasional leaks, to be a better
overall solution.

-- 
   Leo Bicknell - bickn...@ufp.org - CCIE 3440
PGP keys at http://www.ufp.org/~bicknell/


pgpBh0FcoNyfv.pgp
Description: PGP signature


Re: do not filter your customers

2012-02-24 Thread Christopher Morrow
On Fri, Feb 24, 2012 at 3:59 PM, Leo Bicknell  wrote:
> In a message written on Fri, Feb 24, 2012 at 01:04:20PM -0700, Shane Amante 
> wrote:
>> Solving for route leaks is /the/ "killer app" for BGPSEC.  I can't 
>> understand why people keep ignoring this.
>
> Not all "leaks" are bad.
>
> I remember when there was that undersea landside in Asia that took
> out a bunch of undersea cables.  Various providers quickly did
> mutual transit and other arrangements to route around the problem,
> getting a number of things back up quite quickly.  These did not
> match IRR records though, and likely would not have matached BGPSEC
> information, at least not initially.

well for bgpsec so if the paths were signed, and origins signed,
why would they NOT pass BGPSEC muster?

I can see that if the IRR data didn't match up sanely
prefix-lists/filters would need some cajoling, but that likely
happened anyway in this case.

-chris



Re: do not filter your customers

2012-02-24 Thread Leo Bicknell
In a message written on Fri, Feb 24, 2012 at 01:04:20PM -0700, Shane Amante 
wrote:
> Solving for route leaks is /the/ "killer app" for BGPSEC.  I can't understand 
> why people keep ignoring this.

Not all "leaks" are bad.

I remember when there was that undersea landside in Asia that took
out a bunch of undersea cables.  Various providers quickly did
mutual transit and other arrangements to route around the problem,
getting a number of things back up quite quickly.  These did not
match IRR records though, and likely would not have matached BGPSEC
information, at least not initially.

There are plenty of cases where someone "leaks" more specifics with
NO_EXPORT to only one of their BGP peers for the purposes of TE.

The challenge of securing BGP isn't crypto, and it isn't enough
ram/cpu/whatever to process it.  The challenge is getting a crypto
scheme that operators can use to easily represent the real world.
It turns out the real world is quite messy though, often full of
temporary hacks, unusual relationships and other issues.

I'm sure it will be solved, one day.

-- 
   Leo Bicknell - bickn...@ufp.org - CCIE 3440
PGP keys at http://www.ufp.org/~bicknell/


pgp0l4T1E3rbC.pgp
Description: PGP signature


Re: do not filter your customers

2012-02-24 Thread Christopher Morrow
On Fri, Feb 24, 2012 at 3:04 PM, Shane Amante  wrote:

> Solving for route leaks is /the/ "killer app" for BGPSEC.  I can't understand 
> why people keep ignoring this.

I don't think anyone's ignoring the problem... I think lots of people
have said an equivalent of:
1) "How do I know that this path: A - B - C - D
  is a 'leak'?"

Followed by:
2) "Tell me how to answer this programatically given the data we have
today in the routing system" (bgp data on the wire, IRR data, RIR
data)

so far ... both of the above questions haven't been answered (well 1
was answered with: "I will know it when i see it" which isn't helpful
at all in finding a solution)

-chris



Re: do not filter your customers

2012-02-24 Thread Shane Amante
Steve,

On Feb 24, 2012, at 11:10 AM, Steven Bellovin wrote:
> On Feb 24, 2012, at 7:46 40AM, Danny McPherson wrote:
>> On Feb 23, 2012, at 10:42 PM, Randy Bush wrote:
>>> the problem is that you have yet to rigorously define it and how to
>>> unambiguously and rigorously detect it.  lack of that will prevent
>>> anyone from helping you prevent it.
>> 
>> You referred to this incident as a "leak" in your message:
>> 
>> "a customer leaked a full table"
>> 
>> I was simply agreeing with you -- i.e., looked like a "leak", smelled 
>> like a "leak" - let's call it a leak.
>> 
>> I'm optimistic that all the good folks focusing on this in their day
>> jobs, and expressly funded and resourced to do so, will eventually
>> recognize what I'm calling "leaks" is part of the routing security 
>> problem.
>> 
> Sure; I don't disagree, and I don't think that Randy does.  But just
> because we can't solve the whole problem, does that mean we shouldn't
> solve any of it?

Solving for route leaks is /the/ "killer app" for BGPSEC.  I can't understand 
why people keep ignoring this.

As has been discussed in the SIDR WG, BGPSEC will _increase_ state in BGP, 
(more DRAM needed in PE's and RR's, crypto processors to verify sigs, more 
UPDATE traffic for beaconing).  And, at the end of the day, ISP's are going to 
go to their customers and say to them:
- BGP convergence may be slower than in the past, because we're shipping sigs 
around in BGP now
- we can prevent a malicious attack from a random third-party (in the right 
part of the topology);
- *but* I can't protect you from a 20+ year old problem of a transit customer 
accidentally -or- maliciously stealing/dropping your traffic if they leak 
routes from one provider to another provider?


> As Randy said, we can't even try for a strong technical solution
> until we have a definition that's better than "I know it when I see it".

The first step is admitting that we have a problem, then discussing it 
collectively to try to determine a way to prevent said problem from happening.

-shane


Re: do not filter your customers

2012-02-24 Thread Danny McPherson

On Feb 24, 2012, at 2:49 PM, Richard Barnes wrote:

> You seem to think that there's some extension/modification to BGPSEC
> that would fix route leaks in addition to the ASPATH issues that
> BGPSEC addresses right now.  Have you written this up anywhere?  I
> would be interested to read it.

I don't, actually -- as I haven't presupposed that "BGPSEC" is the 
answer to all things routing security related, nor have I excluded it.

I didn't realize it was unacceptable to acknowledge a problem exists 
without having solved already.  I might have that backwards though.

-danny




Re: do not filter your customers

2012-02-24 Thread Richard Barnes
>> I think if we asked telstra why they didn't filter their customer some
>> answer like:
>> 1) we did, we goofed, oops!
>> 2) we don't it's too hard
>> 3) filters? what?
>>
>> I suspect in the case of 1 it's a software problem that needs more
>> belts/suspenders
>> I suspect in the case of 2 it's a problem that could be shown to be
>> simpler with some resource-certification in place
>> I suspect 3 is not likely... (or I hope so).
>>
>> So, even without defining what a leak is, providing a tool to better
>> create/verify filtering would be a boon.
>
>
>
> Yes, I agree!
>
> What I'd hate to see is:
>
> 4) We fully deployed BGPSEC, and RPKI, and upgraded our
> infrastructure, and retooled provisioning, operations and processes
> to support it all fully, and required our customers and peers to use it,
> and even then this still happened - WTF was the point?

I think this is the point:



> This "leak" thing is a key vulnerability that simply can't be brushed
> aside - that's the crux of my frustration with the current effort.

You seem to think that there's some extension/modification to BGPSEC
that would fix route leaks in addition to the ASPATH issues that
BGPSEC addresses right now.  Have you written this up anywhere?  I
would be interested to read it.

--Richard



Re: do not filter your customers

2012-02-24 Thread Danny McPherson

On Feb 24, 2012, at 2:29 PM, Christopher Morrow wrote:
> 
> I think if we asked telstra why they didn't filter their customer some
> answer like:
> 1) we did, we goofed, oops!
> 2) we don't it's too hard
> 3) filters? what?
> 
> I suspect in the case of 1 it's a software problem that needs more
> belts/suspenders
> I suspect in the case of 2 it's a problem that could be shown to be
> simpler with some resource-certification in place
> I suspect 3 is not likely... (or I hope so).
> 
> So, even without defining what a leak is, providing a tool to better
> create/verify filtering would be a boon.



Yes, I agree!

What I'd hate to see is:

4) We fully deployed BGPSEC, and RPKI, and upgraded our 
infrastructure, and retooled provisioning, operations and processes 
to support it all fully, and required our customers and peers to use it, 
and even then this still happened - WTF was the point?

This "leak" thing is a key vulnerability that simply can't be brushed 
aside - that's the crux of my frustration with the current effort.

-danny




Re: do not filter your customers

2012-02-24 Thread Christopher Morrow
On Fri, Feb 24, 2012 at 2:26 PM, Danny McPherson  wrote:

> happens by accident all the time.  How we can justify putting all
> that BGPSEC and RPKI machinery in place and not address this
> "leak" issue somewhere in the mix is, err.., telling.

I think if we asked telstra why they didn't filter their customer some
answer like:
 1) we did, we goofed, oops!
 2) we don't it's too hard
 3) filters? what?

I suspect in the case of 1 it's a software problem that needs more
belts/suspenders
I suspect in the case of 2 it's a problem that could be shown to be
simpler with some resource-certification in place
I suspect 3 is not likely... (or I hope so).

So, even without defining what a leak is, providing a tool to better
create/verify filtering would be a boon.



Re: do not filter your customers

2012-02-24 Thread Danny McPherson

On Feb 24, 2012, at 1:10 PM, Steven Bellovin wrote:

> But just because we can't solve the whole problem, does that
> mean we shouldn't solve any of it?

Nope, we most certainly should decompose the problem into 
addressable elements, that's core to engineering and operations.

However, simply because the currently envisaged solution 
doesn't solve this problem doesn't mean we shouldn't 
acknowledge it exists.

The IETF's BGP security threats document [1]  "describes a threat 
model for BGP path security", which constrains itself to the 
carefully worded SIDR WG charter, which addresses route origin 
authorization and AS_PATH "semantics" -- i.e., this "leak" 
problem is expressly out of scope of a threats document
discussing BGP path security - eh? 

How the heck we can talk about BGP path security and not 
consider this incident a threat is beyond me, particularly when it 
happens by accident all the time.  How we can justify putting all 
that BGPSEC and RPKI machinery in place and not address this 
"leak" issue somewhere in the mix is, err.., telling.

Alas, I suspect we can all agree that experiments are good and 
the market will ultimately decide.

-danny

[1] draft-ietf-sidr-bgpsec-threats-02



Re: do not filter your customers

2012-02-24 Thread Joe Maimon



goe...@anime.net wrote:

On Fri, 24 Feb 2012, Steven Bellovin wrote:

Sure; I don't disagree, and I don't think that Randy does. But just
because we can't solve the whole problem, does that mean we shouldn't
solve any of it?


that is often the way things are argued in engineering circles.

the solution is imperfect therefore it is useless.

this philosophy is reflected in the shoddy state of networks today.

-Dan


Due to which side winning the debate?

Joe



Re: do not filter your customers

2012-02-24 Thread goemon

On Fri, 24 Feb 2012, Steven Bellovin wrote:

Sure; I don't disagree, and I don't think that Randy does.  But just
because we can't solve the whole problem, does that mean we shouldn't
solve any of it?


that is often the way things are argued in engineering circles.

the solution is imperfect therefore it is useless.

this philosophy is reflected in the shoddy state of networks today.

-Dan



Re: do not filter your customers

2012-02-24 Thread Steven Bellovin

On Feb 24, 2012, at 7:46 40AM, Danny McPherson wrote:

> 
> On Feb 23, 2012, at 10:42 PM, Randy Bush wrote:
> 
>> the problem is that you have yet to rigorously define it and how to
>> unambiguously and rigorously detect it.  lack of that will prevent
>> anyone from helping you prevent it.
> 
> You referred to this incident as a "leak" in your message:
> 
> "a customer leaked a full table"
> 
> I was simply agreeing with you -- i.e., looked like a "leak", smelled 
> like a "leak" - let's call it a leak.
> 
> I'm optimistic that all the good folks focusing on this in their day
> jobs, and expressly funded and resourced to do so, will eventually
> recognize what I'm calling "leaks" is part of the routing security 
> problem.
> 
Sure; I don't disagree, and I don't think that Randy does.  But just
because we can't solve the whole problem, does that mean we shouldn't
solve any of it?

As Randy said, we can't even try for a strong technical solution
until we have a definition that's better than "I know it when I see it".



--Steve Bellovin, https://www.cs.columbia.edu/~smb








Re: do not filter your customers

2012-02-24 Thread Danny McPherson

On Feb 23, 2012, at 10:42 PM, Randy Bush wrote:

> the problem is that you have yet to rigorously define it and how to
> unambiguously and rigorously detect it.  lack of that will prevent
> anyone from helping you prevent it.

You referred to this incident as a "leak" in your message:

"a customer leaked a full table"

I was simply agreeing with you -- i.e., looked like a "leak", smelled 
like a "leak" - let's call it a leak.

I'm optimistic that all the good folks focusing on this in their day
jobs, and expressly funded and resourced to do so, will eventually
recognize what I'm calling "leaks" is part of the routing security 
problem.

-danny



Re: do not filter your customers

2012-02-23 Thread Randy Bush
> Also, it's important that network operators understand that
> flap-dampening has been iatrogenic for many years, now.

well, ... 

https://datatracker.ietf.org/doc/draft-ymbk-rfd-usable/

randy



Re: do not filter your customers

2012-02-23 Thread Dobbins, Roland

On Feb 24, 2012, at 9:00 AM, Danny McPherson wrote:

> Prefix limits are rather binary and indiscriminate, indeed.

AS-PATH filters and max-length filters, OTOH, are not.

Also, it's important that network operators understand that flap-dampening has 
been iatrogenic for many years, now.

---
Roland Dobbins  // 

  Luck is the residue of opportunity and design.

   -- John Milton




Re: do not filter your customers

2012-02-23 Thread Randy Bush
>> a customer leaked a full table to smellstra, and they had not filtered.
>> hence the $subject.
> 
> Ahh, this is I think the customer "leak" problem I'm trying to illustrate 
> that an RPKI/BGPSEC-enabled world alone (as currently prescribed) 
> does NOT protect against.  

the problem is that you have yet to rigorously define it and how to
unambiguously and rigorously detect it.  lack of that will prevent
anyone from helping you prevent it.

randy



Re: do not filter your customers

2012-02-23 Thread Danny McPherson

On Feb 23, 2012, at 1:44 AM, Randy Bush wrote:

> a customer leaked a full table to smellstra, and they had not filtered.
> hence the $subject.

Ahh, this is I think the customer "leak" problem I'm trying to illustrate 
that an RPKI/BGPSEC-enabled world alone (as currently prescribed) 
does NOT protect against.  

If it can happen by accident, it can certainly serve as smoke screen or
enable an actual targeted attack quite nicely by those so compelled.

> and things when further downhill from there, when telstra also did not
> filter what they announced to their peers, and the peers went over
> prefix limits and dropped bgp.

Prefix limits are rather binary and indiscriminate, indeed.

-danny



Re: do not filter your customers

2012-02-23 Thread virendra rode
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Speaking of leaking the world, I remember one of our transit peer during
their nightly maintenance decided they needed people to talk to, so they
decided to share some love by passing ~ 350k routes causing a meltdown.

As lesson learned, we included a combination of prefix-list &
maximum-prefix filters as part of our config script.

When the hard limit hits a certain percentage, we get alerted that the
neighbor is approaching the limit.


regards,
/virendra

On 02/22/2012 09:41 PM, Randy Bush wrote:
> don't filter your customers.  when they leak the world to you, it will
> get you a lot of free press and your marketing department will love you.
> 
> just ask telstra.
> 
> randy
> 
> 
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iF4EAREIAAYFAk9GpfUACgkQ3HuimOHfh+HwZgD/dlgPaTsxCs0cyRFVBsDI2J5i
/dLwyQrUADOySuKlgn0A/iuF+gojyqIbLwstPin0Je06KDytE8AYsNuwLXCmAWI5
=qrOK
-END PGP SIGNATURE-



Re: do not filter your customers

2012-02-23 Thread Christopher Morrow
On Thu, Feb 23, 2012 at 1:57 AM, Randy Bush  wrote:
>>> and things when further downhill from there, when telstra also did not
>>> filter what they announced to their peers, and the peers went over
>>> prefix limits and dropped bgp.
>> Oh! so protections worked!
>
> imiho, prefix count is too big a hammer.

sure. aspath-filter! :)

> it would have been better if optus had irr-based filters in place on
> peerings with telstra.  then they would not have dropped the sessions
> and their customers could still reach telstra customers.

really, both parties need/should-have filters, right?
both parties should have their 'irr data' up-to-date...
both parties should also filter outbound prefixes (so they don't leak
internals, or ...etc)

telstra seems to have ~8880 or so prefixes registered in IRRs (via
radb whois lookup)
optus has ~1217 or so prefixes registered in IRRs (again via the same
lookup to radb)

> of course, if telstra did not publish accurately in an irr instance,
> not much optus could do.

it's not clear how accurate the data is :( I do see one example that's
not telstra (and which I don't see through telstra from one host I
tested from)
  203.59.57.0/24

a REACH customer, supposedly, registered by REACH on the behalf of the
customer... the whole /16 there is allocated to the same entity not
REACH though, so that's a tad confusing.

-chris



Re: do not filter your customers

2012-02-23 Thread Christian de Larrinaga
not just the .au govt
C
On 23 Feb 2012, at 07:54, Jay Mitchell wrote:

> I'm laughing now, but it wasn't funny a couple of hours ago. Seems a lot of 
> the .au govt needs to learn some carrier diversity...
> 
> On 23/02/2012, at 4:41 PM, Randy Bush  wrote:
> 
>> don't filter your customers.  when they leak the world to you, it will
>> get you a lot of free press and your marketing department will love you.
>> 
>> just ask telstra.
>> 
>> randy
>> 
> 




Re: do not filter your customers

2012-02-23 Thread Anurag Bhatia
Haha!  Funny

(Sent from my mobile device)

Anurag Bhatia
http://anuragbhatia.com
On Feb 23, 2012 12:27 PM, "Randy Bush"  wrote:

> >> and things when further downhill from there, when telstra also did not
> >> filter what they announced to their peers, and the peers went over
> >> prefix limits and dropped bgp.
> > Oh! so protections worked!
>
> imiho, prefix count is too big a hammer.
>
> it would have been better if optus had irr-based filters in place on
> peerings with telstra.  then they would not have dropped the sessions
> and their customers could still reach telstra customers.
>
> of course, if telstra did not publish accurately in an irr instance,
> not much optus could do.
>
> randy
>
>


Re: do not filter your customers

2012-02-22 Thread Jay Mitchell
I'm laughing now, but it wasn't funny a couple of hours ago. Seems a lot of the 
.au govt needs to learn some carrier diversity...

On 23/02/2012, at 4:41 PM, Randy Bush  wrote:

> don't filter your customers.  when they leak the world to you, it will
> get you a lot of free press and your marketing department will love you.
> 
> just ask telstra.
> 
> randy
> 



Re: do not filter your customers

2012-02-22 Thread Peter Ehiwe
IOS-XR

On 2/23/12, Randy Bush  wrote:
>>> and things when further downhill from there, when telstra also did not
>>> filter what they announced to their peers, and the peers went over
>>> prefix limits and dropped bgp.
>> Oh! so protections worked!
>
> imiho, prefix count is too big a hammer.
>
> it would have been better if optus had irr-based filters in place on
> peerings with telstra.  then they would not have dropped the sessions
> and their customers could still reach telstra customers.
>
> of course, if telstra did not publish accurately in an irr instance,
> not much optus could do.
>
> randy
>
>


-- 
Warm Regards

Peter(CCIE 23782).



Re: do not filter your customers

2012-02-22 Thread Randy Bush
>> and things when further downhill from there, when telstra also did not
>> filter what they announced to their peers, and the peers went over
>> prefix limits and dropped bgp.
> Oh! so protections worked!

imiho, prefix count is too big a hammer.

it would have been better if optus had irr-based filters in place on
peerings with telstra.  then they would not have dropped the sessions
and their customers could still reach telstra customers.

of course, if telstra did not publish accurately in an irr instance,
not much optus could do.

randy



Re: do not filter your customers

2012-02-22 Thread Christopher Morrow
On Thu, Feb 23, 2012 at 1:44 AM, Randy Bush  wrote:
\
> and things when further downhill from there, when telstra also did not
> filter what they announced to their peers, and the peers went over
> prefix limits and dropped bgp.

Oh! so protections worked!

>:)



Re: do not filter your customers

2012-02-22 Thread Randy Bush
> "Dodo has revealed a "minor hardware issue" was behind a Telstra
> outage that impacted multiple service providers and internet services
> nationwide"

bs, trying to blame it on a vendor.

a customer leaked a full table to smellstra, and they had not filtered.
hence the $subject.

and things when further downhill from there, when telstra also did not
filter what they announced to their peers, and the peers went over
prefix limits and dropped bgp.

randy



RE: do not filter your customers

2012-02-22 Thread Christian Nielsen
Who once said, there is no such thing as bad press?

http://www.smh.com.au/technology/technology-news/dodo-takes-blame-for-internet-outage-affecting-millions-20120223-1tpqq.html

http://www.itnews.com.au/News/291364,telstra-router-causes-major-internet-outage.aspx

"Dodo has revealed a "minor hardware issue" was behind a Telstra outage that 
impacted multiple service providers and internet services nationwide"

Does anyone have any additional details?

Christian

-Original Message-
From: Randy Bush [mailto:ra...@psg.com] 
Sent: Wednesday, February 22, 2012 9:42 PM
To: North American Network Operators' Group
Subject: do not filter your customers

don't filter your customers.  when they leak the world to you, it will get you 
a lot of free press and your marketing department will love you.

just ask telstra.

randy






do not filter your customers

2012-02-22 Thread Randy Bush
don't filter your customers.  when they leak the world to you, it will
get you a lot of free press and your marketing department will love you.

just ask telstra.

randy