Re: [sidr] bgpsec-reqs-00

Jared Mauch Fri, 11 Feb 2011 13:03:10 -0800

On Feb 11, 2011, at 10:51 AM, Shane Amante wrote:

> Randy,
> 
> On Jan 30, 2011, at 20:40 MST, Randy Bush wrote:
>>> 3.3 As cryptographic payloads and loading on routers are likely to
>>> seriously increase, a BGPsec design may require use of new hardware.
>>> It must be possible to build routers that do BGPsec with within
>>> acceptable (to operators) bounds of cost and performance.
>>> 
>>> This should be left out of any requirements document, and various
>>> proposed system compared based on their costs and deployment
>>> difficulty.
>> 
>> i take your point.  the intent was that compatibility with current
>> hardware abilities is not a requirement that this document imposes on a
>> solution.  it is quite likely that provider routers will need crypto
>> assist and more ram.  though one hope that the stub customer edge will
>> not.
> 
> Whoa there.  I couldn't disagree more wrt the above.
> 
> First, let's start with the most fundamental question.  Why is it that 
> routers MUST sign, pass around and verify _in-band_ in the control plane 
> various contents/PDU's _within_ BGP?  Note my very careful use of the work 
> _in-band_.  By that I mean inside the BGP session itself, not on a side-band 
> channel like RPKI and/or IRR is used today.  While I have grave concerns over 
> in-band signing & verification, I am [much] less concerned about the latter 
> for several reasons.  With respect to in-band:
> 1)  I'm extremely concerned over dependencies of automatically "trusting" 
> signed data in-band within the control plane and not being able to reach 
> servers (RP's?) to verify the contents of the PDU's are legitimate.  At least 
> with prefix-filters and/or AS_PATH filters, it's very easy for me to manually 
> disable some or all filtering for particular destinations in order to, say, 
> get reachability to servers (RP's) to verify the authenticity of data.
> 2)  Related to point #1, we really should go back to first principles ask 
> ourselves if we're really intending to conflate the _transport_ method (BGP) 
> with the requirement to verify the data _inside_ of BGP.  If so, what is the 
> reason?  Is it solely for convenience, (because BGP transport is already 
> there), or other reasons?


Because it's the easiest place to do it.  i once had someone tell me that 
something should be carried in BGP because it was "easier" than duplicating all 
the code for a new network protocol to do the data copying/replication.  As a 
non-professional implementer, I considered their opinion valuable.

> 3)  I really, really don't like the idea of "will need crypto assist and more 
> ram" on my RE/RP's for several reasons, namely:
>    a)  It's one more set of variables that my already over-worked Capacity 
> Planning and NOC groups need to keep track of and attempt to stay ahead of.
>    b)  It's extremely costly to upgrade RE/RP's, because said RE/RP's are 
> only available from one source -- equipment vendors.  And, the upgrade paths 
> typically don't buy you much in terms of more CPU, etc., because vendors are 
> obligated to source "established" components they know they'll be able to 
> acquire for several years into the future.  And, worst of all, the cycle to 
> get those RE/RP's into the network is extremely long when you start to 
> consider the budgeting, testing of new code, physical installation, customer 
> disruption during maintenance windows, etc.
> ... at least with respect to (b), if I were able to use offboard CPU (i.e.: 
> Intel/AMD servers, like in the RPKI/IRR world), then I have a much larger 
> selection of HW to choose from and I can upgrade those in the network much, 
> much more quickly.

I think there's a few things here that are certainly valid concerns.  I'd like 
to refresh them into the following based on my observations:

1) The transport core of the network where all these control-plane messages 
currently happen is growing and requires dense-high bandwidth routers
2) when you upgrade the router/fabric/whatnot you typically need some new 
routing-engine, rsp, cpu, etc. that is typically "faster".  Maybe not fast 
enough for some, but they tend to be "faster".
3) As a percentage cost of an overall upgrade (take t640 -> t1600) the RE 
component is not the most expensive part by far
4) Cisco/Juniper have some products that are certainly long in the tooth and 
honestly are too underpowered for even their own aggressive cpu 
consumption/growth.  It's so bad in cases they don't realize the performance is 
poor on the older hardware as the labs typically have (on average) newer 
hardware than in the core provider networks.


As of today, crypto can be expensive if you don't have something designed to 
accelerate it (eg: hardware assist for SSL on some platforms).  Just because it 
can be, doesn't mean it is catastrophic.  I'd rather cisco stop leaking memory 
when processing bgp attributes, but 6 months of the TAC staring at the problem 
didn't seem to help them. (Or juniper to stop dying due to ipv6 bgp attributes 
on 4-byte routes).

I don't necessarily trust anyone in this arena.  There's a lot of potential for 
harm.  I don't want my routers to require on some external device.  You may be 
willing to live with that, but I can not.  I can not allow my network to be 
constrained by some COTS PC where the vendors are unwilling to support it, or 
decide that it's cheaper to just refund the entire cost than fix/replace it (As 
is happening with HP/Dell on some devices now - google 'sandy bridge' and/or 
'cougar point').  While an ECO is something painful for anyone, I would rather 
it be on something where the vendor is going to stand behind it, vs people who 
can't trust you to plug in anything (optics, another vendors bgp speaking 
device, etc) and have their devices operate properly.

There needs to be some built-in inherent trust, and as an operator here's the 
things i look at/for:

- Ability to simulate something, eg: sidr/roa checks, as_path validation, etc 
in the cli 
- ability to soft-fail items (appropriate logging/data import and export)
- ability to bootstrap devices
- ability to hard-fail items (above)
- ability to properly log anything!
  (developers are really bad about this, either they dump a full raw hex item, 
or a worthless message that describes an error condition that is not 
decipherable as they don't realize they are part of a full system vs their 
individual zone of importance/ego)

> At least with respect to #1 and #2, I don't see any discussion of the above 
> in the current draft (although maybe I missed it?).  But, IMHO, those are 
> _fundamental_ requirements that need to be discussed among the WG.  Before 
> touching on any of the other points in Russ White's e-mails in this thread, 
> (which I agree with), I think it's important to get back to basics.
> 
> 
>> and the operators with whom we discussed (note that i am an operator,
>> not a vendor with a bad habit of speaking for operators) this thought
>> that this needed to be said from both ends of the scale.  we did not
>> want the future security constrained by a 7200, nor did we want an
>> explosion in costs.  as dollars are the bottom line in our capitalist
>> culture, constraining them seems quite reasonable.
> 
> It wasn't discussed with me.  :-)

I like the fact that Randy has talked about if you do the validation, how much 
incremental cpu time it takes vs processing a prefix-list/as-paths, etc.  True 
comparision/data.  I'm sure he can provide you links showing that the "puny" 
cpus can keep up, and you don't need a 52-way cpu system to process the bgp 
messages.

(i am also frustrated that too few of my colleagues/peers have time to 
participate in this stuff, and also that it takes so much time away from my 
family.. then again, i'm sure i'm some sort of a tech addict as well).

- Jared
_______________________________________________
sidr mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/sidr

Re: [sidr] bgpsec-reqs-00

Reply via email to