Paul,
Maybe the NANOG conference committee (or whatever its called) could get a
couple of major router vendor gerbils to come to the next NANOG and talk
to
this issue?
Maybe?
Okay, I give up.
Recently I've been involved in some issues such as this working with
Alcatel Lucent and Cisco to
On Sun, Aug 29, 2010 at 10:12:35PM +0200, Thomas Mangin wrote:
It would seem to me that there should actually be a better option, e.g.
recognizing the malformed update, and simply discarding it (and sending the
originator an error message) instead of resetting the session.
Resetting of
Apart from one big vendor most BGP speaker only send KEEPALIVES when they
need to. So on my full feeds I see sessions running for more then 1 month
which received less then 300 KEEPALIVE packets.
The negociaged holdtime is always the lower value presented between two
routers. The default
On Mon, 2010-08-30 at 10:58 +0200, Thomas Mangin wrote:
http://www.faqs.org/rfcs/rfc4271.html section 4.2
So unless you know something I don't, I believe you are totally mistaken :)
updates serve as implicit keepalives.
in that same section:
Hold Time:
The calculated value indicates the
On Mon, 2010-08-30 at 10:58 +0200, Thomas Mangin wrote:
http://www.faqs.org/rfcs/rfc4271.html section 4.2
So unless you know something I don't, I believe you are totally mistaken :)
updates serve as implicit keepalives.
Rule #1 do not post when you are not awake yet and quote the text
Thomas,
Wouldn't the confusion come from the fact that updates are considered as
keepalives, so that Claudio sees so few type 4 messages because he receives
updates ?
Sec 4.2, Hold Time :
The calculated value indicates the maximum number of
seconds that may elapse between the receipt of
Florian Weimer wrote:
This whole thread is quite schizophrenic because the consensus appears
to be that (a) a *researcher is not to blame* for sending out a BGP
message which eventually leads to session resets, and (b) an
*implementor is to blame* for sending out a BGP messages which
eventually
Date: Mon, 30 Aug 2010 10:55:03 -0500
From: Jack Bates jba...@brightok.net
Florian Weimer wrote:
This whole thread is quite schizophrenic because the consensus appears
to be that (a) a *researcher is not to blame* for sending out a BGP
message which eventually leads to session resets,
At 12:40 PM 8/30/2010, Kevin Oberman wrote:
This only way they could have caught this one was to have tested to a
CRS which had another router to which it was announcing the attribute in
a mal-formed packet. Worse, the resets should just keep happening as the
CRS would still have the route with
On Mon, Aug 30, 2010 at 15:55, Jack Bates jba...@brightok.net wrote:
...
As good a place to break in on the thread as any, I guess. Randy and others
believe more testing should have been done. I'm not completely sure they
didn't test against XR. They very likely could have tested in a 1 on 1
On Sat, 28 Aug 2010, Brett Frankenberger wrote:
The implementor is to blame becuase the code he wrote send out BGP
messages which were not properly formed.
People talk about not dropping sessions but instead dropping malformed
messages. This is not safe. We've seen ISIS (which is TLV based
This was *silent* error/corruption. I'm not sure I prefer to have
silent problems instead of tearing down the session which is
definitely noticable.
i call the silent fix do-gooder software. it means to do good. when
it works, nobody notices or says thanks. when it fails, there is hell
to
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
On Sun, Aug 29, 2010 at 12:23 AM, Mikael Abrahamsson swm...@swm.pp.se
wrote:
On Sat, 28 Aug 2010, Brett Frankenberger wrote:
The implementor is to blame becuase the code he wrote send out BGP
messages which were not properly formed.
People talk
Guys/girls/furry-creatures-from-!Earth,
Complaining on nanog-ml is likely to only achieve personal stress relief.
This is something you should bring up with your vendor. Say that you'll
move vendors if they don't start making better BGP implementations and
adding the features you guys want. Make
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
On Sun, Aug 29, 2010 at 12:35 AM, Adrian Chadd adr...@creative.net.au
wrote:
Guys/girls/furry-creatures-from-!Earth,
Complaining on nanog-ml is likely to only achieve personal stress relief.
This is something you should bring up with your
On Aug 29, 2010, at 2:30 PM, Paul Ferguson wrote:
It would seem to me that there should actually be a better option, e.g.
recognizing the malformed update, and simply discarding it (and sending the
originator an error message) instead of resetting the session.
Generation of the error
On Sun, Aug 29, 2010 at 12:30:21AM -0700, Paul Ferguson wrote:
It would seem to me that there should actually be a better option, e.g.
recognizing the malformed update, and simply discarding it (and sending the
originator an error message) instead of resetting the session.
Resetting of BGP
Richard A Steenbergen r...@e-gerbil.net writes:
Just out of curiosity, at what point will we as operators rise up
against the ivory tower protocol designers at the IETF and demand that
they add a mechanism to not bring down the entire BGP session because of
a single malformed attribute?
On 8/29/10 3:23 AM, Mikael Abrahamsson wrote:
On Sat, 28 Aug 2010, Brett Frankenberger wrote:
The implementor is to blame becuase the code he wrote send out BGP messages
which were not properly formed.
People talk about not dropping sessions but instead dropping malformed
messages. This is
On 8/29/10 9:31 AM, Bjørn Mork wrote:
Richard A Steenbergen r...@e-gerbil.net writes:
Just out of curiosity, at what point will we as operators rise up
against the ivory tower protocol designers at the IETF and demand that
they add a mechanism to not bring down the entire BGP session
On 8/27/10 1:07 PM, Mike Gatti wrote:
where's the change management process in all of this.
basically now we are going to starting changing things that can
potentially have an adverse affect on users without letting anyone know
before hand Interesting concept.
BGP is transitive, change
It seems that creating a worst case BGP test suite for all kinds of nastiness
(in light of the recent RIPE thing) might not be a bad idea - so that we can
all test the implementation ourselves before we deploy new code.
Normally those things are done by vendors - that what we pay them good
It would seem to me that there should actually be a better option, e.g.
recognizing the malformed update, and simply discarding it (and sending the
originator an error message) instead of resetting the session.
Resetting of BGP sessions should only be done in the most dire of
circumstances,
On Sun, Aug 29, 2010 at 3:12 PM, Thomas Mangin
thomas.man...@exa-networks.co.uk wrote:
However to make sense you would need to find a resynchronisation point to
only exclude the one faulty message. Initially I thought that the last
received KEEPALIVE (for the receiver of the error message)
Every BGP message header has a portion that starts with 16
all-bits-1 octets, for compatibility.
This is distinctive enough an implementation can guess where the next
message starts.
i desperately feared reading this. i do not want to bet the internet on
guessing where anythings starts.
I'm assuming that they weren't really expecting this to cause issues... Where
does one draw the line? I'm planning on announcing x.y.z.0/20 later in the
week -- x, y and z are all prime and the sum of all 3 is also a prime. There
is a non-zero chance that something somewhere will go flooie,
imiho, researchers injecting data into the control plane are responsible
to have tested it at least against major bgp speakers. and, considering
its placement in the net (big core), i consider ios xr to be a major
speaker.
i suspect that these folk will test better next time. i sure hope so.
On 28 Aug 2010, at 08:56, Randy Bush wrote:
imiho, researchers injecting data into the control plane are responsible
to have tested it at least against major bgp speakers. and, considering
its placement in the net (big core), i consider ios xr to be a major
speaker.
i suspect that these
i suspect that these folk will test better next time. i sure hope so.
Not sure the researcher can afford to buy a ios xr and may not have
access to one !
then ask on *nog for someone against whom they can test.
randy
On (2010-08-28 09:22 +0100), Thomas Mangin wrote:
i suspect that these folk will test better next time. i sure hope so.
Not sure the researcher can afford to buy a ios xr and may not have access to
one !
Indeed.
Also testing is hard, especially so, when you essentially need to reinvent
On Sat, Aug 28, 2010 at 09:22:34AM +0100, Thomas Mangin wrote:
On 28 Aug 2010, at 08:56, Randy Bush wrote:
imiho, researchers injecting data into the control plane are responsible
to have tested it at least against major bgp speakers. and, considering
its placement in the net (big core),
i suspect that these folk will test better next time. i sure hope
so.
Not sure the researcher can afford to buy a ios xr and may not have
access to one !
Also testing is hard
so is cleaning up the mess when you screw up enough of the internet to
make the international press.
Maybe we as
while this is undoubtedly true for hobbiest researchers, there are
pretty good relationships between vendors and some research facilities
with a strong interst in ensuring there is external review of the
code base(es).
(I am personally aware of at least five such
On (2010-08-28 18:20 +0900), Randy Bush wrote:
a bgp regression suite would not have caught this as it was not a
repeat. but it sure would be useful to implementors.
Naturally 'proving' that non-trivial software works is practically
impossible. But stating what non-existing test-suite would
I am really surprised by these attitudes. Guys (and gals), these
incidents simply go to reinforce that the software we depend on, has
not received sufficient testing and that we all have gigantic
exposures due to things outside of our direct control
nice anti-vendor rant. but over the last
Quagga is even worse that Cisco when it comes to packet validation but it
should not surprise anyone :p
To substantiate my claim, my mercurial log tells me that for MPRNLRI and
MPURNLRI having the flag set as Transitive instead of Optional did not cause
Quagga to complain. It just took the
* Christopher Morrow:
(you are asking your vendors to run full bit sweeps of each protocol
in a regimented manner checking for all possible edge cases and
properly handling them, right?)
The real issue is that both spec and current practice say you need to
drop the session as soon as you
On 08/28/2010 11:39 AM, Saku Ytti wrote:
On (2010-08-28 18:20 +0900), Randy Bush wrote:
a bgp regression suite would not have caught this as it was not a
repeat. but it sure would be useful to implementors.
Naturally 'proving' that non-trivial software works is practically
Those tools are not suitable for regression testing ( I know I wrote exabgp )
not saying they could not be adapted though.
Fizzing may return crashes or issues with the daemon but it is unlikely. You
need predictable input for regression testing and in our particular case how do
you detect a
* Randy Bush:
imiho, researchers injecting data into the control plane are
responsible to have tested it at least against major bgp speakers.
Practically, this boils down to don't do that, which is certainly
fine by me.
To carry out such experiments responsibly, you have to conduct so much
* Randy Bush:
a bgp regression suite would not have caught this as it was not a
repeat.
Eh, it was just another corrupt-and-propagate issue combined with the
broken (but RFC-required) session reset policy on malformed updates.
On (2010-08-28 13:23 +0200), Thomas Mangin wrote:
Those tools are not suitable for regression testing ( I know I wrote exabgp )
not saying they could not be adapted though.
Fizzing may return crashes or issues with the daemon but it is unlikely. You
need predictable input for regression
On Sat, Aug 28, 2010 at 04:56:05PM +0900, Randy Bush wrote:
imiho, researchers injecting data into the control plane are responsible
to have tested it at least against major bgp speakers. and, considering
its placement in the net (big core), i consider ios xr to be a major
speaker.
i
I think that focusing on researchers (who we assume are good-intentioned)
misses the point. Any connected BGP speaker can inject any form of ugliness.
The routers that mishandled these updates were bounded by routers that were
able to 'properly' handle corrupted updates.
The question of
My point was not about crafted bgp message to test border cases - this is what
one would expect in a regression suite.
It is about the use of a fuzzer to corrupt packet when you then do not know if
the router is then behaving correctly or not.
---
from my iPhone
On 28 Aug 2010, at 13:36, Saku
On Sat, Aug 28, 2010 at 01:09:47PM +0200, Leen Besselink wrote:
On 08/28/2010 11:39 AM, Saku Ytti wrote:
On (2010-08-28 18:20 +0900), Randy Bush wrote:
a bgp regression suite would not have caught this as it was not a
repeat. but it sure would be useful to implementors.
To carry out such experiments responsibly, you have to conduct so much
testing beforehand that the live test on the actual Internet will not
yield new insights (assuming you did your pre-experiment testing
properly).
you seem to assume the purpose of the test was to see if routers
crashed. i
On 08/28/2010 01:52 PM, Thomas Mangin wrote:
My point was not about crafted bgp message to test border cases - this is
what one would expect in a regression suite.
It is about the use of a fuzzer to corrupt packet when you then do not know
if the router is then behaving correctly or not.
* Claudio Jeker:
I think you blame the wrong people. The vendor should make sure that
their implementation does not violate the very basics of the BGP
protocol.
The curious thing here is that the peer that resets the session, as
required by the spec, causes the actual damage (the session
Am I the only one on the list which saw the sentence in Cisco's
Advisory Before sending the the unknown attribute to peers, the IOS XR
corrupted it which clearly states this was a bug?!
Hi!
I think you blame the wrong people. The vendor should make sure that
their implementation does not violate the very basics of the BGP
protocol.
The curious thing here is that the peer that resets the session, as
required by the spec, causes the actual damage (the session reset),
and not
* Raymond Dijkxhoorn:
Not sure if the link was posted allready ...
http://www.cisco.com/en/US/products/products_security_advisory09186a0080b4411f.shtml
Cisco posts their advisories to the NANOG list.
'The vulnerability manifests itself when a BGP peer announces a prefix
with a specific,
* Randy Bush:
To carry out such experiments responsibly, you have to conduct so much
testing beforehand that the live test on the actual Internet will not
yield new insights (assuming you did your pre-experiment testing
properly).
you seem to assume the purpose of the test was to see if
Hi!
Cisco posts their advisories to the NANOG list.
'The vulnerability manifests itself when a BGP peer announces a prefix
with a specific, valid but unrecognized transitive attribute. On
receipt of this prefix, the Cisco IOS XR device will corrupt the
attribute before sending it to the
We had ASN4, AS-PATH and this one. More or less we hit this session reset
problem once a year but nothing was done yet to change the RFC.
So I am to blame as much as every network engineer to not have pushed for a
change or at least a comprehensive explanation on the session teardown
behaviour
I agree correctly framed invalid packet should be discarded without tearing
the session down.
This statement is way to simplistic.
I would be interested if anyone has pointers toward any work which was done to
sort this issue.
Thanks.
Thomas
On Sat, Aug 28, 2010 at 02:51:17PM +0200, Thomas Mangin wrote:
We had ASN4, AS-PATH and this one. More or less we hit this session
reset problem once a year but nothing was done yet to change the RFC.
You are mixing up three totaly different problems. Sure the result was the
same (session
On Sat, Aug 28, 2010 at 02:19:28PM +0200, Florian Weimer wrote:
* Claudio Jeker:
I think you blame the wrong people. The vendor should make sure that
their implementation does not violate the very basics of the BGP
protocol.
The curious thing here is that the peer that resets the
On Fri, Aug 27, 2010 at 2:33 PM, Dave Israel da...@otd.com wrote:
On 8/27/2010 3:22 PM, Jared Mauch wrote:
[snip]
an MD5 hash that can be added to the packet. If the TCP hash checks
Hello, layering violation.If the TCP MD5 option was used, the
MD5 checksum was probably correct.
On Sat, Aug 28, 2010 at 6:14 AM, Florian Weimer f...@deneb.enyo.de wrote:
* Christopher Morrow:
(you are asking your vendors to run full bit sweeps of each protocol
in a regimented manner checking for all possible edge cases and
properly handling them, right?)
The real issue is that both
Morrow morrowc.li...@gmail.com
To: Florian Weimer f...@deneb.enyo.de
Cc: nanog@nanog.org nanog@nanog.org
Sent: Sun Aug 29 01:12:00 2010
Subject: Re: Did your BGP crash today?
On Sat, Aug 28, 2010 at 6:14 AM, Florian Weimer f...@deneb.enyo.de wrote:
* Christopher Morrow:
(you are asking your
I did see some attribute 99 stuff go around earlier today and have not yet
researched it.
Unknown BGP attribute 99 (flags: 240)
Unknown BGP attribute 99 (flags: 240)
Unknown BGP attribute 99 (flags: 240)
Unknown BGP attribute 99 (flags: 240)
Unknown BGP attribute 99 (flags: 240)
- Jared
On Aug
On Fri, 27 Aug 2010 19:27:06 +0200, Kasper Adel said:
Havent seen a thread on this one so thought i'd start one.
Ripe tested a new attribute that crashed the internet, is that true?
If it in fact crashed the internet, as opposed to gave a few buggy routers
here and there indigestion, you
No down time here, Would have been all over the news and everything if it
really do crash the internet.
Nick Olsen
Network Operations
(321) 205-1100 x106
From: Kasper Adel karim.a...@gmail.com
Sent: Friday, August 27, 2010 1:27 PM
To: NANOG list
Well played, Sir.
Nick Olsen
Network Operations
(321) 205-1100 x106
From: valdis.kletni...@vt.edu
Sent: Friday, August 27, 2010 1:32 PM
To: Kasper Adel karim.a...@gmail.com
Subject: Re: Did your BGP crash today?
On Fri, 27 Aug 2010 19:27:06 +0200
Looking at the graph of at least one of the european exchange where RIS
connect, it had an impact. Now saying it was nothing is like saying that the
YouTube incident was nothing as you were not affected as you do not use YouTube.
Some people did feel the pain - lucky it was not you :)
Thomas
-networks.co.uk]
Sent: Friday, August 27, 2010 11:44 AM
To: n...@brevardwireless.com
Cc: nanog@nanog.org
Subject: Re: Did your BGP crash today?
Looking at the graph of at least one of the european exchange where RIS
connect, it had an impact. Now saying it was nothing is like saying that the
YouTube incident
On 27-08-10 19:31, valdis.kletni...@vt.edu wrote:
On Fri, 27 Aug 2010 19:27:06 +0200, Kasper Adel said:
Havent seen a thread on this one so thought i'd start one.
Ripe tested a new attribute that crashed the internet, is that true?
If it in fact crashed the internet, as opposed to gave a few
On 27 Aug 2010, at 19:27, Grzegorz Janoszka wrote:
On 27-08-10 19:31, valdis.kletni...@vt.edu wrote:
On Fri, 27 Aug 2010 19:27:06 +0200, Kasper Adel said:
Havent seen a thread on this one so thought i'd start one.
Ripe tested a new attribute that crashed the internet, is that true?
If it
FYI:
--
Dear Colleagues,
On Friday 27 August, from 08:41 to 09:08 UTC, the RIPE NCC Routing
Information Service (RIS) announced a route with an experimental BGP
attribute. During this announcement, some Internet Service
So much for better left off public mailing lists ! sigh !
Thomas
On 27 Aug 2010, at 19:42, Lucy Lynch wrote:
FYI:
--
Dear Colleagues,
On Friday 27 August, from 08:41 to 09:08 UTC, the RIPE NCC Routing
Information
sorry - found via google...
- Lucy
On Fri, 27 Aug 2010, Thomas Mangin wrote:
So much for better left off public mailing lists ! sigh !
Thomas
On 27 Aug 2010, at 19:42, Lucy Lynch wrote:
FYI:
--
Dear Colleagues,
On
On 27-08-10 20:41, Thomas Mangin wrote:
I think most of the impact was limited to Europe, especially Amsterdam area.
Yes, It had an effect on ISPs which are connected to RIS.
http://www.ripe.net/ris/
AFAIK this mean ASes at LINX and AMS-IX . The LINX graph shows a similar (but
smaller) dip of
On 27 Aug 2010, at 20:03, Grzegorz Janoszka wrote:
On 27-08-10 20:41, Thomas Mangin wrote:
I think most of the impact was limited to Europe, especially Amsterdam area.
Yes, It had an effect on ISPs which are connected to RIS.
http://www.ripe.net/ris/
AFAIK this mean ASes at LINX and AMS-IX
On Fri, Aug 27, 2010 at 01:29:15PM -0400, Jared Mauch wrote:
Unknown BGP attribute 99 (flags: 240)
Unknown BGP attribute 99 (flags: 240)
Unknown BGP attribute 99 (flags: 240)
Unknown BGP attribute 99 (flags: 240)
Unknown BGP attribute 99 (flags: 240)
Just out of curiosity, at what point
On 2010-08-27 21:13, Richard A Steenbergen wrote:
On Fri, Aug 27, 2010 at 01:29:15PM -0400, Jared Mauch wrote:
Unknown BGP attribute 99 (flags: 240)
Unknown BGP attribute 99 (flags: 240)
Unknown BGP attribute 99 (flags: 240)
Unknown BGP attribute 99 (flags: 240)
Unknown BGP attribute 99
On Aug 27, 2010, at 3:13 PM, Richard A Steenbergen wrote:
On Fri, Aug 27, 2010 at 01:29:15PM -0400, Jared Mauch wrote:
Unknown BGP attribute 99 (flags: 240)
Unknown BGP attribute 99 (flags: 240)
Unknown BGP attribute 99 (flags: 240)
Unknown BGP attribute 99 (flags: 240)
Unknown BGP
On Aug 27, 2010, at 3:17 PM, Jeroen Massar wrote:
On 2010-08-27 21:13, Richard A Steenbergen wrote:
On Fri, Aug 27, 2010 at 01:29:15PM -0400, Jared Mauch wrote:
Unknown BGP attribute 99 (flags: 240)
Unknown BGP attribute 99 (flags: 240)
Unknown BGP attribute 99 (flags: 240)
Unknown BGP
On 8/27/2010 3:22 PM, Jared Mauch wrote:
When you are processing something, it's sometimes hard to tell if something
just was mis-parsed (as I think the case is here with the missing-2-bytes)
vs just getting garbage. Perhaps there should be some way to re-sync when
you are having this
where's the change management process in all of this.
basically now we are going to starting changing things that can
potentially have an adverse affect on users without letting anyone know
before hand Interesting concept.
On Aug 27, 2010, at 3:33 PM, Dave Israel wrote:
On 8/27/2010
On Fri, Aug 27, 2010 at 4:07 PM, Mike Gatti ekim.it...@gmail.com wrote:
where's the change management process in all of this.
basically now we are going to starting changing things that can
potentially have an adverse affect on users without letting anyone know
before hand Interesting
On Aug 27, 2010, at 12:13 PM, Richard A Steenbergen wrote:
Just out of curiosity, at what point will we as operators rise up
against the ivory tower protocol designers at the IETF and demand that
they add a mechanism to not bring down the entire BGP session because of
a single malformed
On Fri, 27 Aug 2010 13:43:39 PDT, Clay Fiske said:
If -everyone- dropped the session on a bad attribute, it likely wouldn't
make it far enough into the wild to cause these problems in the first
place.
That works fine for malformed attributes. It blows chunks for legally formed
but unknown
On Fri, Aug 27, 2010 at 01:43:39PM -0700, Clay Fiske wrote:
If -everyone- dropped the session on a bad attribute, it likely
wouldn't make it far enough into the wild to cause these problems in
the first place.
And if everyone filtered their BGP customers there would be no routing
leaks,
come on Chris, is the Internet an experiment or not? :)
one would think that a responsible party would have made
efforts to let others in the playground know they were
going to try something different that could have ramifications
on an unkown distribution
On Fri, Aug 27, 2010 at 04:57:17PM -0400, valdis.kletni...@vt.edu wrote:
On Fri, 27 Aug 2010 13:43:39 PDT, Clay Fiske said:
If -everyone- dropped the session on a bad attribute, it likely wouldn't
make it far enough into the wild to cause these problems in the first
place.
That works
On Aug 27, 2010, at 5:37 PM, bmann...@vacation.karoshi.com wrote:
come on Chris, is the Internet an experiment or not? :)
one would think that a responsible party would have made
efforts to let others in the playground know they were
going to try something
On Aug 27, 2010, at 1:57 PM, valdis.kletni...@vt.edu wrote:
On Fri, 27 Aug 2010 13:43:39 PDT, Clay Fiske said:
If -everyone- dropped the session on a bad attribute, it likely wouldn't
make it far enough into the wild to cause these problems in the first
place.
That works fine for
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
On Fri, Aug 27, 2010 at 5:02 PM, Clay Fiske c...@bloomcounty.org wrote:
On Aug 27, 2010, at 1:57 PM, valdis.kletni...@vt.edu wrote:
That works fine for malformed attributes. It blows chunks for legally
formed but unknown attributes - how
Once upon a time, Paul Ferguson fergdawgs...@gmail.com said:
As an aside, I see that Cisco has released a late Friday afternoon security
advisory on this issue:
Huh, I had an upstream (with Cisco gear on their end) do URGENT
maintenance last night with less than 12 hours notice. I wonder if
Just out of curiosity, at what point will we as operators rise up
against the ivory tower protocol designers at the IETF and demand that
they add a mechanism to not bring down the entire BGP session because
of a single malformed attribute?
there is a problem underlying this. bgp is not tlv.
So much for better left off public mailing lists ! sigh !
damn! security through obscurity busted again. will people never
learn?
/sarcasm?
randy
92 matches
Mail list logo