RE: BGP Experiment

2019-02-06 Thread adamv0025
> From: Randy Bush 
> Sent: Thursday, January 31, 2019 6:56 PM
> 
> > I suspect simple bugs are found by vendor, complex bugs are not
> > economic to find.
> 
> the running internet is complex and has a horrifying number of special
cases
> compounded by kiddies being clever.  no one, independent of resource
> requirements, could build a lab to the scale needed to test.
>
Yes what can break will break, yet here we are exchanging emails, I think
your statement assumes a vast search space.
No need to solve the whole thing, just to make my tiny part a bit better. 
No need to solve my tiny part for eternity, just for the near term.
Yes there will always be this long tail, but with what one would deem a
sufficiently low probability, in the intersection of the above search
spaces.

> and then there is ewd's famous quote about testing.
>
Yes human brains have their limits, hence we invented AI to help us solve
complexity.
Though in a sense it's just shifting the complexity to yet another layer
above...
 
adam



Re: BGP Experiment

2019-01-31 Thread Randy Bush
> I suspect simple bugs are found by vendor, complex bugs are not
> economic to find.

the running internet is complex and has a horrifying number of special
cases compounded by kiddies being clever.  no one, independent of
resource requirements, could build a lab to the scale needed to test.

and then there is ewd's famous quote about testing.

randy


Re: BGP Experiment

2019-01-31 Thread Saku Ytti
Hey,

> This is because you did your due diligence during the testing.
> Do you have statistics on the probability of these "complex" bugs occurrence?

No. I wish I had and I hope to make change on this. Try to translate
how good investment test is, how many customer outages it has saved
etc.

I suspect simple bugs are found by vendor, complex bugs are not
economic to find. And testing is more proof of work than business
case.

-- 
  ++ytti


RE: BGP Experiment

2019-01-31 Thread adamv0025
> From: Saku Ytti 
> Sent: Friday, January 25, 2019 7:59 AM
> 
> On Thu, 24 Jan 2019 at 18:43,  wrote:
> 
> > We fight with that all the time,
> > I'd say that from the whole Design->Certify->Deploy->Verify->Monitor
> service lifecycle time budget, the service certification testing is almost 
> half of
> it.
> > That's why I'm so interested in a model driven design and testing approach.
> 
> This shop has 100% automated blackbox testing, and still they have to cherry-
> pick what to test. 
>
Sure one tests only for the few specific current and near future use cases.

> Do you have statistics how often you find show-stopper
> issues and how far into the test they were found? 
>
I don't keep those statistics, but running bug scrubs in order to determine the 
code for regression testing is usually good starting point to avoid 
show-stoppers, what is then found later on during the testing is usually 
patched -so yes you end up with a brand new code and several patches related to 
your use cases (PEs, Ps, etc..)
   
> I expect this to be
> exponential curve, like upgrading box, getting your signalling protocols up,
> pushing one packet in each service you sell is easy and fast, I wonder will
> massive amount of work increase confidence significantly from that. 
>
Yes it will.

> The
> issues I tend to find in production are issues which are not trivial to 
> recreate
> in lab, once we know what they are, which implies that finding them a-priori
> is bit naive expectation. So, assumptions:
>
This is because you did your due diligence during the testing. 
Do you have statistics on the probability of these "complex" bugs occurrence?

> Hopefully we'll enter NOS future where we download NOS from github and
> compile it to our devices. Allowing whole community to contribute to unit
> testing and use-cases and to run minimal bug surface code in your
> environment.
>
Not there yet, but you can compile your own routing protocols and run those on 
vendor OS.

> I see very little future in blackbox testing vendor NOS at operator site,
> beyond quick poke at lab. Seems like poor value. Rather have pessimistic
> deployment plan, lab => staging => 2-3 low risk site =>
> 2-3 high risk site => slow roll up
> 
Yes that's also a possibility -one of the strong arguments for massive 
disaggregation at the edge, to reduce the fallout of a potential critical 
failure.
Depends on the shop really.

> > I really need to have this ever growing library of test cases that the 
> > automat
> will churn through with very little human intervention, in order to reduce the
> testing from months to days or weeks at least.
> 
> Lot of vendor, maybe all, accept your configuration and test them for
> releases. I think this is only viable solution vendors have for blackbox, 
> gather
> configs from customers and test those, instead of try to guess what to test.
> I've done that with Cisco in two companies, unfortunately I can't really tell 
> if it
> impacted quality, but I like to think it did.
> 
Did that with juniper partners and now directly with Juniper. 
The thing is though they are using our test plan...

adam  



Re: BGP Experiment

2019-01-30 Thread hank

On 23/01/2019 19:40, Job Snijders wrote:

I agree with Job.  Continue the experiment and warn us in advance.

-Hank


Dear Ben, all,

I'm not sure this experiment should be canceled. On the public Internet
we MUST assume BGP speakers are compliant with the BGP-4 protocol.
Broken BGP-4 speakers are what they are: broken. They must be fixed, or
the operator must accept the consequences.

"Get a sandbox like every other researcher" is not a fair statement, one
can also posit "Get a compliant BGP-4 implementation like every other
network operator".

When bad guys explicitly seek to target these Asian and Australian
operators you reference (who apparently have not upgraded to the vendor
recommended release), using *valid* BGP updates, will a politely emailed
request help resolve the situation? Of course not!

Stopping the experiment is only treating symptoms, the root cause must
be addressed: broken software.

Kind regards,

Job




Re: BGP Experiment

2019-01-28 Thread Brian Kantor
On Sun, Jan 27, 2019 at 01:21:56PM -0500, William Allen Simpson wrote:
> On 1/26/19 6:37 PM, Randy Bush wrote:
> > to nick's point.  as nick knows, i am a naggumite; one of my few
> > disagreements with dr postel.  but there is a difference between
> > writing protocol specs/code, and with sending packets on the global
> > internet.  rigor in the former, prudence in the latter.
> > 
> OK, Randy, you peaked my interest: what is a naggumite?
> 
> Many of us disagreed with Jon Postel from time to time, but he
> usually understood the alternative points of view.

I fondly recall that Erik could be quite acerbic, as I think is
well exemplified by this:

   "If I had to deal with you professionally, I would have told you
   to hold the onions and give me large fries."   - Erik Naggum

Unfortunately, I don't recall to whom he said that; I suppose I am
lucky that it wasn't me.
- Brian


Re: BGP Experiment

2019-01-27 Thread Nick Hilliard

William Allen Simpson wrote on 27/01/2019 18:21:

OK, Randy, you peaked my interest: what is a naggumite?


http://naggum.no/worse-is-better.html

a.k.a. "perfect is the enemy of good enough".

Nick


Re: BGP Experiment

2019-01-27 Thread Randy Bush
> OK, Randy, you peaked my interest: what is a naggumite?

erik naggum, an early and strong proponent of being strict.  you've been
around long enough you should remember erik.

> Many of us disagreed with Jon Postel from time to time, but he usually
> understood the alternative points of view.

oh, i have been dealing with network cowboys (and yes, unsurprisingly
pretty universally boys) for enough decades to mostly understand.  but
the lack of prudence and level of irresponsibility occasionally surprise
me.

randy


[2019/01/27] Re: BGP Experiment

2019-01-27 Thread Hansen, Christoffer

On 27/01/2019 19:21, William Allen Simpson wrote:
> OK, Randy, you peaked my interest: what is a naggumite?

... (scouring the internet)

o
https://www.nanog.org/mailinglist/mailarchives/old_archive/2006-01/msg00250.html
o https://en.wikipedia.org/wiki/Erik_Naggum
o https://www.dictionary.com/browse/-ite

?



signature.asc
Description: OpenPGP digital signature


Re: BGP Experiment

2019-01-27 Thread William Allen Simpson

On 1/26/19 6:37 PM, Randy Bush wrote:

to nick's point.  as nick knows, i am a naggumite; one of my few
disagreements with dr postel.  but there is a difference between
writing protocol specs/code, and with sending packets on the global
internet.  rigor in the former, prudence in the latter.


OK, Randy, you peaked my interest: what is a naggumite?

Many of us disagreed with Jon Postel from time to time, but he
usually understood the alternative points of view.


Re: BGP Experiment

2019-01-26 Thread Randy Bush
> As we've discovered after many such events, the overlap between the
> people who read those lists and the people running outdated vulnerable
> software isn't very large.

to steal from a reply to a private message:

there are a jillion folk at the edges of the net running with low end
gear, low margins, and 312 pressures.  *knowingly* abusing them into an
update a week is just not reasonable ops behavior.

and, at the other extreme, big core isps have a pre-deployment test
window of six or more months.  the only win here is that public
embarrassment does help to get the big vendors to give us a fix with
which to start the lab test cycle.  bug reports to tac seem not to.

randy


Re: BGP Experiment

2019-01-26 Thread Owen DeLong



> On Jan 26, 2019, at 16:48, valdis.kletni...@vt.edu wrote:
> 
> On Sat, 26 Jan 2019 11:37:05 -0800, Owen DeLong said:
>>1.Compile a list of lists that should be notified of such experiments 
>> in
>>advance. Try to get the word out to as much of the community
>>as possible through various NOGs and other relevant industry
>>lists.
> 
> As we've discovered after many such events, the overlap between the people who
> read those lists and the people running outdated vulnerable software isn't 
> very
> large.
> 
> 

While this may be true, if you have a better suggestion for how to reach them, 
I’m all ears. Otherwise, doing the best we can to disseminate the information 
as widely as possible seems the most practicable approach currently available. 

Owen




Re: BGP Experiment

2019-01-26 Thread valdis . kletnieks
On Sat, 26 Jan 2019 11:37:05 -0800, Owen DeLong said:
>   1.  Compile a list of lists that should be notified of such 
> experiments in
>   advance. Try to get the word out to as much of the community
>   as possible through various NOGs and other relevant industry
>   lists.

As we've discovered after many such events, the overlap between the people who
read those lists and the people running outdated vulnerable software isn't very
large.





Re: BGP Experiment

2019-01-26 Thread Randy Bush
> I think a better question is, once a vulnerability has become
> widespread public knowledge, do you expect malicious actors, malware
> authors and intelligence agencies of autocratic nation-states to obey
> a gentlemens' agreement not to exploit something?

false anology, or maybe just a subject switch.  the 'attacker' was not
a nation state nor intentionally malicious.  it was a naïve researcher
meaning no harm.  in fact, i have co-authored with ítalo, and he is a
very well meaning, and usually cautious, researcher.  he just fell in
with a crew with a rep for ops cluelessness that needed to demonstrate
it once again.

to nick's point.  as nick knows, i am a naggumite; one of my few
disagreements with dr postel.  but there is a difference between
writing protocol specs/code, and with sending packets on the global
internet.  rigor in the former, prudence in the latter.

while it is tragicaly true that someone will be willing to load mrs
schächter on the cattle car, it damned well ain't gonna be me.

randy


Re: BGP Experiment

2019-01-26 Thread Nick Hilliard

Randy Bush wrote on 26/01/2019 16:15:

if you know of an out-of-spec vulnerability or bug in deployed router,
switch, server, ... ops and researchers should exploit it as much as
possible in order to encourage fixing of the hole.


It came out as "please continue", but the sentiment sounded less like 
malice / ignorance, and more like a lack of sympathy for people who 
leave equipment connected to the dfz which shouldn't be connected to the 
dfz.



given the number of bugs/vulns, are you comfortable that this is going
to scale well?  and this is prudent when our primary responsibility is a
running internet?


This isn't the first time that a malformed IANA BGP attribute 
implementation caused service loss, and it's unlikely to be the last 
time either.


https://sempf.net/post/On-Testing1

Some time in the future, it will be acceptable to continue the DISCO 
experiment along its current lines because bgp stack authors will 
remember the time that attribute 255 caused things to explode and their 
code bases will be resilient to this problem.


When this happens, will it be acceptable to announce prefixes with 
arbitrary unassigned attributes with random contents?  Where does the 
boundary lie between what is and what is not acceptable?  Do we assign a 
time limit after which it's considered generally acceptable to announce 
attributes or capabilities which are known to cause problems?  If 
someone were to set up a beacon system which announced prefixes with 
unassigned attributes and garbage content, is that a useful community 
service or simply a nuisance?


The research people acted correctly in stopping the experiment. They 
could engage with the IETF IDR working group to get a temporary 
attribute code point rather than using 255, and it would be interesting 
to see results from this.


But I'm not convinced that it's feasible for the internet community to 
assert that any particular machination of bgp announcement is out of 
bounds in perpetuity - in the longer term, this will promote systemic 
infrastructural weakness rather than doing what we all aspire to, namely 
creating a more resilient internet.


Nick


Re: BGP Experiment

2019-01-26 Thread Eric Kuhnke
I think a better question is, once a vulnerability has become widespread
public knowledge, do you expect malicious actors, malware authors and
intelligence agencies of autocratic nation-states to obey a gentlemens'
agreement not to exploit something?

There is not a great deal of venn diagram overlap between "organizations
that will pay $2 million for a zero day remote exploit on the latest
version of iOS" and "people who care about whether Randy Bush recommends
them for a job".


On Sat, Jan 26, 2019 at 8:16 AM Randy Bush  wrote:

> i just want to make sure that folk are really in agreement with what i
> think i have been hearing from a lot of strident voices here.
>
> if you know of an out-of-spec vulnerability or bug in deployed router,
> switch, server, ... ops and researchers should exploit it as much as
> possible in order to encourage fixing of the hole.
>
> given the number of bugs/vulns, are you comfortable that this is going
> to scale well?  and this is prudent when our primary responsibility is a
> running internet?
>
> just checkin'
>
> randy
>
>
> PS: if you think this, speak up so i can note to never hire or recommend
> you.
>
> PPS: Anant Shah, Romain Fontugne, Emile Aben, Cristel Pelsser, and Randy
>  Bush; "Disco: Fast, Good, and Cheap Outage Detection"; TMA 2017
> ^ :)
>


Re: BGP Experiment

2019-01-26 Thread Owen DeLong
I think that’s a bit of reductio ad absurdum from what has been said.

I would prefer that researchers collaborate to:

1.  Compile a list of lists that should be notified of such 
experiments in
advance. Try to get the word out to as much of the community
as possible through various NOGs and other relevant industry
lists.

2.  Use said list of lists to provide at least 7 days advance 
notice of
such testing, ideally with links to the details of the 
vulnerability
in question and known vulnerable and known good code bases
for as many software/hardware platforms as feasible. (Ideally
list unknowns and solicit feedback as well).

3.  Provide contact information for reporting test-related problems,
issues, affected software versions, etc. Ideally an email 
address
for after-action reports of data and a phone number that will
be monitored during active testing for emergent reports of
test-related service disruptions.

4.  Conduct the test for incrementally longer periods over time.
e.g. start with a 15 minute test on the first try and then run
30, 60, and multi-hour tests on later dates after addressing
any reported problems during earlier tests.

I think such behavior would provide the best intersection of encouraging
patching/fixing while also minimizing disruption and harm to innocent
third parties.

Owen


> On Jan 26, 2019, at 8:15 AM, Randy Bush  wrote:
> 
> i just want to make sure that folk are really in agreement with what i
> think i have been hearing from a lot of strident voices here.
> 
> if you know of an out-of-spec vulnerability or bug in deployed router,
> switch, server, ... ops and researchers should exploit it as much as
> possible in order to encourage fixing of the hole.
> 
> given the number of bugs/vulns, are you comfortable that this is going
> to scale well?  and this is prudent when our primary responsibility is a
> running internet?
> 
> just checkin'
> 
> randy
> 
> 
> PS: if you think this, speak up so i can note to never hire or recommend
>you.
> 
> PPS: Anant Shah, Romain Fontugne, Emile Aben, Cristel Pelsser, and Randy
> Bush; "Disco: Fast, Good, and Cheap Outage Detection"; TMA 2017
>^ :)



Re: BGP Experiment

2019-01-26 Thread Randy Bush
i just want to make sure that folk are really in agreement with what i
think i have been hearing from a lot of strident voices here.

if you know of an out-of-spec vulnerability or bug in deployed router,
switch, server, ... ops and researchers should exploit it as much as
possible in order to encourage fixing of the hole.

given the number of bugs/vulns, are you comfortable that this is going
to scale well?  and this is prudent when our primary responsibility is a
running internet?

just checkin'

randy


PS: if you think this, speak up so i can note to never hire or recommend
you.

PPS: Anant Shah, Romain Fontugne, Emile Aben, Cristel Pelsser, and Randy
 Bush; "Disco: Fast, Good, and Cheap Outage Detection"; TMA 2017
^ :)


Re: BGP Experiment

2019-01-25 Thread Mark Tees
I did realise a little after this that it would be a no no to talk this
security wise.

On Sat, 26 Jan 2019 at 12:47, Mark Tees  wrote:

> I might be reading this wrong but it appears only one person has raised an
> issue and then not actually backed it up with data.
>
> Out of the eyes that have views inside the major networks did anyone see
> any issues?
>
> Surely cross posting this to other NOG lists is sufficienct.
>
>
> On Sat, 26 Jan 2019 at 09:15, Randy via NANOG  wrote:
>
>> OP is yet to clarify how a single /24 advertisement caused a
>> "massive-prefix spike/flap"; in OP's words.
>>
>> The Experiment should continue.
>> -Randy
>>
>>
>> On Friday, January 25, 2019, 2:32:47 PM PST, Tom Beecher
>>  wrote:
>>
>> If I understand this thread correctly, the test cause no actual change in
>> the routing table size or route announcement. That was all a result of the
>> incorrect behavior of the software.
>>
>> Instead of throwing rocks, how about some data instead. We can
>> collaborate and better understand the whole thing so make it better and
>> move on to the next thing. Yelling about "North America" when 4 of the 7
>> listed researchers on the test are NOT IN NORTH AMERICA doesn't really help
>> anything.
>>
>>
>>
>>
>>
>> On Thu, Jan 24, 2019 at 10:25 AM Ben Cooper  wrote:
>> > Can you stop this?
>> >
>> > You caused again a massive prefix spike/flap, and as the internet is
>> not centered around NA (shock horror!) a number of operators in Asia and
>> Australia go effected by your “expirment” and had no idea what was
>> happening or why.
>> >
>> > Get a sandbox like every other researcher, as of now we have black
>> holed and filtered your whole ASN, and have reccomended others do the same.
>> >
>> > On Wed, 23 Jan 2019 at 1:19 am, Italo Cunha  wrote:
>> >> NANOG,
>> >>
>> >> This is a reminder that this experiment will resume tomorrow
>> >> (Wednesday, Jan. 23rd). We will announce 184.164.224.0/24 carrying a
>> >> BGP attribute of type 0xff (reserved for development) between 14:00
>> >> and 14:15 GMT.
>> >>
>> >> On Tue, Dec 18, 2018 at 10:05 AM Italo Cunha 
>> wrote:
>> >>>
>> >>> NANOG,
>> >>>
>> >>> We would like to inform you of an experiment to evaluate alternatives
>> >>> for speeding up adoption of BGP route origin validation (research
>> >>> paper with details [A]).
>> >>>
>> >>> Our plan is to announce prefix 184.164.224.0/24 with a valid
>> >>> standards-compliant unassigned BGP attribute from routers operated by
>> >>> the PEERING testbed [B, C]. The attribute will have flags 0xe0
>> >>> (optional transitive [rfc4271, S4.3]), type 0xff (reserved for
>> >>> development), and size 0x20 (256bits).
>> >>>
>> >>> Our collaborators recently ran an equivalent experiment with no
>> >>> complaints or known issues [A], and so we do not anticipate any
>> >>> arising. Back in 2010, an experiment using unassigned attributes by
>> >>> RIPE and Duke University caused disruption in Internet routing due to
>> >>> a bug in Cisco routers [D, CVE-2010-3035]. Since then, this and other
>> >>> similar bugs have been patched [e.g., CVE-2013-6051], and new BGP
>> >>> attributes have been assigned (BGPsec-path) and adopted (large
>> >>> communities). We have successfully tested propagation of the
>> >>> announcements on Cisco IOS-based routers running versions 12.2(33)SRA
>> >>> and 15.3(1)S, Quagga 0.99.23.1 and 1.1.1, as well as BIRD 1.4.5 and
>> >>> 1.6.3.
>> >>>
>> >>> We plan to announce 184.164.224.0/24 from 8 PEERING locations for a
>> >>> predefined period of 15 minutes starting 14:30 GMT, from Monday to
>> >>> Thursday, between the 7th and 22nd of January, 2019 (full schedule and
>> >>> locations [E]). We will stop the experiment immediately in case any
>> >>> issues arise.
>> >>>
>> >>> Although we do not expect the experiment to cause disruption, we
>> >>> welcome feedback on its safety and especially on how to make it safer.
>> >>> We can be reached at disco-experim...@googlegroups.com.
>> >>>
>> >>> Amir Herzberg, University of Connecticut
>> >>> Ethan Katz-Bassett, Columbia University
>> >>> Haya Shulman, Fraunhofer SIT
>> >>> Ítalo Cunha, Universidade Federal de Minas Gerais
>> >>> Michael Schapira, Hebrew University of Jerusalem
>> >>> Tomas Hlavacek, Fraunhofer SIT
>> >>> Yossi Gilad, MIT
>> >>>
>> >>> [A] https://conferences.sigcomm.org/hotnets/2018/program.html
>> >>> [B] http://peering.usc.edu
>> >>> [C] https://goo.gl/AFR1Cn
>> >>> [D]
>> https://labs.ripe.net/Members/erik/ripe-ncc-and-duke-university-bgp-experiment
>> >>> [E] https://goo.gl/nJhmx1
>> >>
>> > --
>> > Ben Cooper
>> > Chief Executive Officer
>> > PacketGG - Multicast
>> > M(Telstra): 0410 411 301
>> > M(Optus):  0434 336 743
>> > E: b...@packet.gg & b...@multicast.net.au
>> > W: https://packet.gg
>> > W: https://multicast.net.au
>> >
>> >
>> >
>>
> --
> Regards,
>
> Mark L. Tees
>
-- 
Regards,

Mark L. Tees


Re: BGP Experiment

2019-01-25 Thread Mark Tees
I might be reading this wrong but it appears only one person has raised an
issue and then not actually backed it up with data.

Out of the eyes that have views inside the major networks did anyone see
any issues?

Surely cross posting this to other NOG lists is sufficienct.


On Sat, 26 Jan 2019 at 09:15, Randy via NANOG  wrote:

> OP is yet to clarify how a single /24 advertisement caused a
> "massive-prefix spike/flap"; in OP's words.
>
> The Experiment should continue.
> -Randy
>
>
> On Friday, January 25, 2019, 2:32:47 PM PST, Tom Beecher
>  wrote:
>
> If I understand this thread correctly, the test cause no actual change in
> the routing table size or route announcement. That was all a result of the
> incorrect behavior of the software.
>
> Instead of throwing rocks, how about some data instead. We can collaborate
> and better understand the whole thing so make it better and move on to the
> next thing. Yelling about "North America" when 4 of the 7 listed
> researchers on the test are NOT IN NORTH AMERICA doesn't really help
> anything.
>
>
>
>
>
> On Thu, Jan 24, 2019 at 10:25 AM Ben Cooper  wrote:
> > Can you stop this?
> >
> > You caused again a massive prefix spike/flap, and as the internet is not
> centered around NA (shock horror!) a number of operators in Asia and
> Australia go effected by your “expirment” and had no idea what was
> happening or why.
> >
> > Get a sandbox like every other researcher, as of now we have black holed
> and filtered your whole ASN, and have reccomended others do the same.
> >
> > On Wed, 23 Jan 2019 at 1:19 am, Italo Cunha  wrote:
> >> NANOG,
> >>
> >> This is a reminder that this experiment will resume tomorrow
> >> (Wednesday, Jan. 23rd). We will announce 184.164.224.0/24 carrying a
> >> BGP attribute of type 0xff (reserved for development) between 14:00
> >> and 14:15 GMT.
> >>
> >> On Tue, Dec 18, 2018 at 10:05 AM Italo Cunha  wrote:
> >>>
> >>> NANOG,
> >>>
> >>> We would like to inform you of an experiment to evaluate alternatives
> >>> for speeding up adoption of BGP route origin validation (research
> >>> paper with details [A]).
> >>>
> >>> Our plan is to announce prefix 184.164.224.0/24 with a valid
> >>> standards-compliant unassigned BGP attribute from routers operated by
> >>> the PEERING testbed [B, C]. The attribute will have flags 0xe0
> >>> (optional transitive [rfc4271, S4.3]), type 0xff (reserved for
> >>> development), and size 0x20 (256bits).
> >>>
> >>> Our collaborators recently ran an equivalent experiment with no
> >>> complaints or known issues [A], and so we do not anticipate any
> >>> arising. Back in 2010, an experiment using unassigned attributes by
> >>> RIPE and Duke University caused disruption in Internet routing due to
> >>> a bug in Cisco routers [D, CVE-2010-3035]. Since then, this and other
> >>> similar bugs have been patched [e.g., CVE-2013-6051], and new BGP
> >>> attributes have been assigned (BGPsec-path) and adopted (large
> >>> communities). We have successfully tested propagation of the
> >>> announcements on Cisco IOS-based routers running versions 12.2(33)SRA
> >>> and 15.3(1)S, Quagga 0.99.23.1 and 1.1.1, as well as BIRD 1.4.5 and
> >>> 1.6.3.
> >>>
> >>> We plan to announce 184.164.224.0/24 from 8 PEERING locations for a
> >>> predefined period of 15 minutes starting 14:30 GMT, from Monday to
> >>> Thursday, between the 7th and 22nd of January, 2019 (full schedule and
> >>> locations [E]). We will stop the experiment immediately in case any
> >>> issues arise.
> >>>
> >>> Although we do not expect the experiment to cause disruption, we
> >>> welcome feedback on its safety and especially on how to make it safer.
> >>> We can be reached at disco-experim...@googlegroups.com.
> >>>
> >>> Amir Herzberg, University of Connecticut
> >>> Ethan Katz-Bassett, Columbia University
> >>> Haya Shulman, Fraunhofer SIT
> >>> Ítalo Cunha, Universidade Federal de Minas Gerais
> >>> Michael Schapira, Hebrew University of Jerusalem
> >>> Tomas Hlavacek, Fraunhofer SIT
> >>> Yossi Gilad, MIT
> >>>
> >>> [A] https://conferences.sigcomm.org/hotnets/2018/program.html
> >>> [B] http://peering.usc.edu
> >>> [C] https://goo.gl/AFR1Cn
> >>> [D]
> https://labs.ripe.net/Members/erik/ripe-ncc-and-duke-university-bgp-experiment
> >>> [E] https://goo.gl/nJhmx1
> >>
> > --
> > Ben Cooper
> > Chief Executive Officer
> > PacketGG - Multicast
> > M(Telstra): 0410 411 301
> > M(Optus):  0434 336 743
> > E: b...@packet.gg & b...@multicast.net.au
> > W: https://packet.gg
> > W: https://multicast.net.au
> >
> >
> >
>
-- 
Regards,

Mark L. Tees


Re: BGP Experiment

2019-01-25 Thread Randy via NANOG
OP is yet to clarify how a single /24 advertisement caused a "massive-prefix 
spike/flap"; in OP's words.

The Experiment should continue.
-Randy


On Friday, January 25, 2019, 2:32:47 PM PST, Tom Beecher  
wrote: 

If I understand this thread correctly, the test cause no actual change in the 
routing table size or route announcement. That was all a result of the 
incorrect behavior of the software. 

Instead of throwing rocks, how about some data instead. We can collaborate and 
better understand the whole thing so make it better and move on to the next 
thing. Yelling about "North America" when 4 of the 7 listed researchers on the 
test are NOT IN NORTH AMERICA doesn't really help anything. 





On Thu, Jan 24, 2019 at 10:25 AM Ben Cooper  wrote:
> Can you stop this?
> 
> You caused again a massive prefix spike/flap, and as the internet is not 
> centered around NA (shock horror!) a number of operators in Asia and 
> Australia go effected by your “expirment” and had no idea what was happening 
> or why.
> 
> Get a sandbox like every other researcher, as of now we have black holed and 
> filtered your whole ASN, and have reccomended others do the same. 
> 
> On Wed, 23 Jan 2019 at 1:19 am, Italo Cunha  wrote:
>> NANOG,
>> 
>> This is a reminder that this experiment will resume tomorrow
>> (Wednesday, Jan. 23rd). We will announce 184.164.224.0/24 carrying a
>> BGP attribute of type 0xff (reserved for development) between 14:00
>> and 14:15 GMT.
>> 
>> On Tue, Dec 18, 2018 at 10:05 AM Italo Cunha  wrote:
>>>
>>> NANOG,
>>>
>>> We would like to inform you of an experiment to evaluate alternatives
>>> for speeding up adoption of BGP route origin validation (research
>>> paper with details [A]).
>>>
>>> Our plan is to announce prefix 184.164.224.0/24 with a valid
>>> standards-compliant unassigned BGP attribute from routers operated by
>>> the PEERING testbed [B, C]. The attribute will have flags 0xe0
>>> (optional transitive [rfc4271, S4.3]), type 0xff (reserved for
>>> development), and size 0x20 (256bits).
>>>
>>> Our collaborators recently ran an equivalent experiment with no
>>> complaints or known issues [A], and so we do not anticipate any
>>> arising. Back in 2010, an experiment using unassigned attributes by
>>> RIPE and Duke University caused disruption in Internet routing due to
>>> a bug in Cisco routers [D, CVE-2010-3035]. Since then, this and other
>>> similar bugs have been patched [e.g., CVE-2013-6051], and new BGP
>>> attributes have been assigned (BGPsec-path) and adopted (large
>>> communities). We have successfully tested propagation of the
>>> announcements on Cisco IOS-based routers running versions 12.2(33)SRA
>>> and 15.3(1)S, Quagga 0.99.23.1 and 1.1.1, as well as BIRD 1.4.5 and
>>> 1.6.3.
>>>
>>> We plan to announce 184.164.224.0/24 from 8 PEERING locations for a
>>> predefined period of 15 minutes starting 14:30 GMT, from Monday to
>>> Thursday, between the 7th and 22nd of January, 2019 (full schedule and
>>> locations [E]). We will stop the experiment immediately in case any
>>> issues arise.
>>>
>>> Although we do not expect the experiment to cause disruption, we
>>> welcome feedback on its safety and especially on how to make it safer.
>>> We can be reached at disco-experim...@googlegroups.com.
>>>
>>> Amir Herzberg, University of Connecticut
>>> Ethan Katz-Bassett, Columbia University
>>> Haya Shulman, Fraunhofer SIT
>>> Ítalo Cunha, Universidade Federal de Minas Gerais
>>> Michael Schapira, Hebrew University of Jerusalem
>>> Tomas Hlavacek, Fraunhofer SIT
>>> Yossi Gilad, MIT
>>>
>>> [A] https://conferences.sigcomm.org/hotnets/2018/program.html
>>> [B] http://peering.usc.edu
>>> [C] https://goo.gl/AFR1Cn
>>> [D] 
>>> https://labs.ripe.net/Members/erik/ripe-ncc-and-duke-university-bgp-experiment
>>> [E] https://goo.gl/nJhmx1
>> 
> -- 
> Ben Cooper
> Chief Executive Officer
> PacketGG - Multicast
> M(Telstra): 0410 411 301
> M(Optus):  0434 336 743
> E: b...@packet.gg & b...@multicast.net.au
> W: https://packet.gg
> W: https://multicast.net.au
> 
> 
> 


Re: BGP Experiment

2019-01-25 Thread Tom Beecher
If I understand this thread correctly, the test cause no actual change in
the routing table size or route announcement. That was all a result of the
incorrect behavior of the software.

Instead of throwing rocks, how about some data instead. We can collaborate
and better understand the whole thing so make it better and move on to the
next thing. Yelling about "North America" when 4 of the 7 listed
researchers on the test are NOT IN NORTH AMERICA doesn't really help
anything.





On Thu, Jan 24, 2019 at 10:25 AM Ben Cooper  wrote:

> Can you stop this?
>
> You caused again a massive prefix spike/flap, and as the internet is not
> centered around NA (shock horror!) a number of operators in Asia and
> Australia go effected by your “expirment” and had no idea what was
> happening or why.
>
> Get a sandbox like every other researcher, as of now we have black holed
> and filtered your whole ASN, and have reccomended others do the same.
>
> On Wed, 23 Jan 2019 at 1:19 am, Italo Cunha  wrote:
>
>> NANOG,
>>
>> This is a reminder that this experiment will resume tomorrow
>> (Wednesday, Jan. 23rd). We will announce 184.164.224.0/24 carrying a
>> BGP attribute of type 0xff (reserved for development) between 14:00
>> and 14:15 GMT.
>>
>> On Tue, Dec 18, 2018 at 10:05 AM Italo Cunha  wrote:
>> >
>> > NANOG,
>> >
>> > We would like to inform you of an experiment to evaluate alternatives
>> > for speeding up adoption of BGP route origin validation (research
>> > paper with details [A]).
>> >
>> > Our plan is to announce prefix 184.164.224.0/24 with a valid
>> > standards-compliant unassigned BGP attribute from routers operated by
>> > the PEERING testbed [B, C]. The attribute will have flags 0xe0
>> > (optional transitive [rfc4271, S4.3]), type 0xff (reserved for
>> > development), and size 0x20 (256bits).
>> >
>> > Our collaborators recently ran an equivalent experiment with no
>> > complaints or known issues [A], and so we do not anticipate any
>> > arising. Back in 2010, an experiment using unassigned attributes by
>> > RIPE and Duke University caused disruption in Internet routing due to
>> > a bug in Cisco routers [D, CVE-2010-3035]. Since then, this and other
>> > similar bugs have been patched [e.g., CVE-2013-6051], and new BGP
>> > attributes have been assigned (BGPsec-path) and adopted (large
>> > communities). We have successfully tested propagation of the
>> > announcements on Cisco IOS-based routers running versions 12.2(33)SRA
>> > and 15.3(1)S, Quagga 0.99.23.1 and 1.1.1, as well as BIRD 1.4.5 and
>> > 1.6.3.
>> >
>> > We plan to announce 184.164.224.0/24 from 8 PEERING locations for a
>> > predefined period of 15 minutes starting 14:30 GMT, from Monday to
>> > Thursday, between the 7th and 22nd of January, 2019 (full schedule and
>> > locations [E]). We will stop the experiment immediately in case any
>> > issues arise.
>> >
>> > Although we do not expect the experiment to cause disruption, we
>> > welcome feedback on its safety and especially on how to make it safer.
>> > We can be reached at disco-experim...@googlegroups.com.
>> >
>> > Amir Herzberg, University of Connecticut
>> > Ethan Katz-Bassett, Columbia University
>> > Haya Shulman, Fraunhofer SIT
>> > Ítalo Cunha, Universidade Federal de Minas Gerais
>> > Michael Schapira, Hebrew University of Jerusalem
>> > Tomas Hlavacek, Fraunhofer SIT
>> > Yossi Gilad, MIT
>> >
>> > [A] https://conferences.sigcomm.org/hotnets/2018/program.html
>> > [B] http://peering.usc.edu
>> > [C] https://goo.gl/AFR1Cn
>> > [D]
>> https://labs.ripe.net/Members/erik/ripe-ncc-and-duke-university-bgp-experiment
>> > [E] https://goo.gl/nJhmx1
>>
> --
> Ben Cooper
> Chief Executive Officer
> PacketGG - Multicast
> M(Telstra): 0410 411 301
> M(Optus):  0434 336 743
> E: b...@packet.gg & b...@multicast.net.au
> W: https://packet.gg
> W: https://multicast.net.au
>
>


Re: BGP Experiment

2019-01-25 Thread Jakob Heitz (jheitz) via NANOG
It does, Ytti. And not just in testing. In feature development too.
Often in design discussions, someone pipes up: "someone does bla bla,
Let's not break it". One I remember from years ago was setting two
route reflectors as clients of each other and thinking route reflection
wasn't designed for that. It's being aware of such customer "creativity"
that keeps us on our toes.

Regards,
Jakob.

-Original Message-
From: Saku Ytti 

Lot of vendor, maybe all, accept your configuration and test them for
releases. I think this is only viable solution vendors have for
blackbox, gather configs from customers and test those, instead of try
to guess what to test.
I've done that with Cisco in two companies, unfortunately I can't
really tell if it impacted quality, but I like to think it did.


-- 
  ++ytti


Re: BGP Experiment

2019-01-24 Thread Saku Ytti
On Thu, 24 Jan 2019 at 18:43,  wrote:

> We fight with that all the time,
> I'd say that from the whole Design->Certify->Deploy->Verify->Monitor service 
> lifecycle time budget, the service certification testing is almost half of it.
> That's why I'm so interested in a model driven design and testing approach.

This shop has 100% automated blackbox testing, and still they have to
cherry-pick what to test. Do you have statistics how often you find
show-stopper issues and how far into the test they were found? I
expect this to be exponential curve, like upgrading box, getting your
signalling protocols up, pushing one packet in each service you sell
is easy and fast, I wonder will massive amount of work increase
confidence significantly from that. The issues I tend to find in
production are issues which are not trivial to recreate in lab, once
we know what they are, which implies that finding them a-priori is bit
naive expectation. So, assumptions:

a) blackbox testing has exponentially diminishing returns, quickly you
need to expand massively more efforts to gain slightly more confidence
b) you can never say 'x works' you can only say 'i found way to
confirm x is not broken in this very specific case', the way x will
end up being broken may be very complex
c) if recreating issues you know about is hard, then finding issues
you don't know about is massively more difficult
d) testing likely increases more your comfort to deploy than
probability of success

Hopefully we'll enter NOS future where we download NOS from github and
compile it to our devices. Allowing whole community to contribute to
unit testing and use-cases and to run minimal bug surface code in your
environment.
I see very little future in blackbox testing vendor NOS at operator
site, beyond quick poke at lab. Seems like poor value. Rather have
pessimistic deployment plan, lab => staging => 2-3 low risk site =>
2-3 high risk site => slow roll up

> I really need to have this ever growing library of test cases that the 
> automat will churn through with very little human intervention, in order to 
> reduce the testing from months to days or weeks at least.

Lot of vendor, maybe all, accept your configuration and test them for
releases. I think this is only viable solution vendors have for
blackbox, gather configs from customers and test those, instead of try
to guess what to test.
I've done that with Cisco in two companies, unfortunately I can't
really tell if it impacted quality, but I like to think it did.





-- 
  ++ytti


Re: BGP Experiment

2019-01-24 Thread valdis . kletnieks
On Thu, 24 Jan 2019 04:00:27 +1100, Ben Cooper said:

> You caused again a massive prefix spike/flap,

That's twice now you've said that without any numbers or details.

Care to explain what you mean by "massive" in a world where the IPv4 table has
like 700K+ routes? And as percieved by what point(s) in the topology?

Knowing where there are pockets of network admins shooting themselves in the
foot drastically improves the ability of organizations like NetDotctors Without
Borders to give proper aid where needed...




Re: BGP Experiment

2019-01-24 Thread Mike Hale
Or you could simply fix your gear rather than leaving a big hole in your
infrastructure.

*shrug*

On Thu, Jan 24, 2019, 7:25 AM Ben Cooper  Can you stop this?
>
> You caused again a massive prefix spike/flap, and as the internet is not
> centered around NA (shock horror!) a number of operators in Asia and
> Australia go effected by your “expirment” and had no idea what was
> happening or why.
>
> Get a sandbox like every other researcher, as of now we have black holed
> and filtered your whole ASN, and have reccomended others do the same.
>
> On Wed, 23 Jan 2019 at 1:19 am, Italo Cunha  wrote:
>
>> NANOG,
>>
>> This is a reminder that this experiment will resume tomorrow
>> (Wednesday, Jan. 23rd). We will announce 184.164.224.0/24 carrying a
>> BGP attribute of type 0xff (reserved for development) between 14:00
>> and 14:15 GMT.
>>
>> On Tue, Dec 18, 2018 at 10:05 AM Italo Cunha  wrote:
>> >
>> > NANOG,
>> >
>> > We would like to inform you of an experiment to evaluate alternatives
>> > for speeding up adoption of BGP route origin validation (research
>> > paper with details [A]).
>> >
>> > Our plan is to announce prefix 184.164.224.0/24 with a valid
>> > standards-compliant unassigned BGP attribute from routers operated by
>> > the PEERING testbed [B, C]. The attribute will have flags 0xe0
>> > (optional transitive [rfc4271, S4.3]), type 0xff (reserved for
>> > development), and size 0x20 (256bits).
>> >
>> > Our collaborators recently ran an equivalent experiment with no
>> > complaints or known issues [A], and so we do not anticipate any
>> > arising. Back in 2010, an experiment using unassigned attributes by
>> > RIPE and Duke University caused disruption in Internet routing due to
>> > a bug in Cisco routers [D, CVE-2010-3035]. Since then, this and other
>> > similar bugs have been patched [e.g., CVE-2013-6051], and new BGP
>> > attributes have been assigned (BGPsec-path) and adopted (large
>> > communities). We have successfully tested propagation of the
>> > announcements on Cisco IOS-based routers running versions 12.2(33)SRA
>> > and 15.3(1)S, Quagga 0.99.23.1 and 1.1.1, as well as BIRD 1.4.5 and
>> > 1.6.3.
>> >
>> > We plan to announce 184.164.224.0/24 from 8 PEERING locations for a
>> > predefined period of 15 minutes starting 14:30 GMT, from Monday to
>> > Thursday, between the 7th and 22nd of January, 2019 (full schedule and
>> > locations [E]). We will stop the experiment immediately in case any
>> > issues arise.
>> >
>> > Although we do not expect the experiment to cause disruption, we
>> > welcome feedback on its safety and especially on how to make it safer.
>> > We can be reached at disco-experim...@googlegroups.com.
>> >
>> > Amir Herzberg, University of Connecticut
>> > Ethan Katz-Bassett, Columbia University
>> > Haya Shulman, Fraunhofer SIT
>> > Ítalo Cunha, Universidade Federal de Minas Gerais
>> > Michael Schapira, Hebrew University of Jerusalem
>> > Tomas Hlavacek, Fraunhofer SIT
>> > Yossi Gilad, MIT
>> >
>> > [A] https://conferences.sigcomm.org/hotnets/2018/program.html
>> > [B] http://peering.usc.edu
>> > [C] https://goo.gl/AFR1Cn
>> > [D]
>> https://labs.ripe.net/Members/erik/ripe-ncc-and-duke-university-bgp-experiment
>> > [E] https://goo.gl/nJhmx1
>>
> --
> Ben Cooper
> Chief Executive Officer
> PacketGG - Multicast
> M(Telstra): 0410 411 301
> M(Optus):  0434 336 743
> E: b...@packet.gg & b...@multicast.net.au
> W: https://packet.gg
> W: https://multicast.net.au
>
>


RE: BGP Experiment

2019-01-24 Thread adamv0025
> From: Saku Ytti 
> Sent: Thursday, January 24, 2019 4:28 PM
> 
> On Thu, 24 Jan 2019 at 17:52,  wrote:
> 
> > This actually makes me thing that it might be worthwhile including
> > these types of test to the regression testing suite.
> 
> I seem to recall one newish entrant to SP market explaining that they are
> limited by wall-time in blackbox testing. They would have no particularly
> challenge testing everything, but the amount of permutations and wall-time
> to execute single test simply makes it impossible to test comprehensively. So
> if you can't test everything, what do you test? How do you predict what is
> more likely to be broken?
> 
We fight with that all the time, 
I'd say that from the whole Design->Certify->Deploy->Verify->Monitor service 
lifecycle time budget, the service certification testing is almost half of it.
That's why I'm so interested in a model driven design and testing approach.
I really need to have this ever growing library of test cases that the automat 
will churn through with very little human intervention, in order to reduce the 
testing from months to days or weeks at least.

> There are some commercial BGP fuzzers, I've only tested one of them:
> https://www.synopsys.com/software-integrity/security-testing/fuzz-
> testing/defensics/protocols/bgp4-server.html
> 
Thank you very much for the link.

adam



Re: BGP Experiment

2019-01-24 Thread Saku Ytti
On Thu, 24 Jan 2019 at 17:52,  wrote:

> This actually makes me thing that it might be worthwhile including these
> types of test to the regression testing suite.

I seem to recall one newish entrant to SP market explaining that they
are limited by wall-time in blackbox testing. They would have no
particularly challenge testing everything, but the amount of
permutations and wall-time to execute single test simply makes it
impossible to test comprehensively. So if you can't test everything,
what do you test? How do you predict what is more likely to be broken?

Focus on MTBF is fools errant, maybe someone like FB, AMZN, MSFT can
do statistical analysis on outcome of change, rest of us are just
guessing what we did increased MTBF, we don't have enough failures to
actually know.
Focus should be on MTTR.

There are some commercial BGP fuzzers, I've only tested one of them:
https://www.synopsys.com/software-integrity/security-testing/fuzz-testing/defensics/protocols/bgp4-server.html

-- 
  ++ytti


RE: BGP Experiment

2019-01-24 Thread adamv0025
> From: Brian Kantor
> Sent: Thursday, January 24, 2019 3:58 PM
> 
> I agree.
> 
> It seems to me that testing with almost-valid data (well formed, but with
> disallowed values) as well as fuzz-testing are essential parts of software
> quality control.
>
To be frank,
Have blasted packets at the platforms from ixias and sipernts to see if they
break gracefully, loaded millions of routes and thousands of VRF and BGP
sessions to see what happens, even designed the backbones with separate
Internet and VPN RRs, and enabled enhanced error handling, but have I ever
sat down and generated BGP packets with slight deviations to see how the BGP
session, process or whole RPD copes with these? 
I've got to say no, never.
And judging from the overly positive (or even negative) responses to the BGP
Experiment I'm not alone in this.
Otherwise everyone would be like, nah I don't care as I have all my bases
covered and I know how my BGP behaves processing exceptions.


adam  




Re: BGP Experiment

2019-01-24 Thread Brian Kantor
On Thu, Jan 24, 2019 at 03:49:46PM -, adamv0...@netconsultings.com wrote:
> This actually makes me thing that it might be worthwhile including these
> types of test to the regression testing suite.
> So that every time we evaluate new code or vendor we don't only test for
> functionality, performance and scalability, but also for robustness 
> i.e. sending a whole heap of trash down the sockets which are accessible
> form the Internet (via the iACL holes), to limit the scope of the test.
> 
> Rather than relying on experiments to notify us the hard way that something
> is not right.
> 
> adam

I agree.

It seems to me that testing with almost-valid data (well formed,
but with disallowed values) as well as fuzz-testing are essential
parts of software quality control.
- Brian



RE: BGP Experiment

2019-01-24 Thread adamv0025
> From: James Jun
> Sent: Wednesday, January 23, 2019 7:02 PM
> 
> Agreed;  Please resume the experiment.  We're all operators here, and we
> MUST have confidence that BGP speakers on our network are compliant with
> protocol specification.  Experiments like this are opportunities for a
real-life
> validation of how our devices handle messages that are out of the norm,
and
> help us identify issues.
> 
> Kudos to researchers by the way, for sending courtesy announcements in
> advance, and testing against some common platforms available to them
> (Cisco, Quagga & BIRD) prior to the experiment.
> 
This actually makes me thing that it might be worthwhile including these
types of test to the regression testing suite.
So that every time we evaluate new code or vendor we don't only test for
functionality, performance and scalability, but also for robustness 
i.e. sending a whole heap of trash down the sockets which are accessible
form the Internet (via the iACL holes), to limit the scope of the test.

Rather than relying on experiments to notify us the hard way that something
is not right.

adam



Re: BGP Experiment

2019-01-24 Thread Ben Cooper
Can you stop this?

You caused again a massive prefix spike/flap, and as the internet is not
centered around NA (shock horror!) a number of operators in Asia and
Australia go effected by your “expirment” and had no idea what was
happening or why.

Get a sandbox like every other researcher, as of now we have black holed
and filtered your whole ASN, and have reccomended others do the same.

On Wed, 23 Jan 2019 at 1:19 am, Italo Cunha  wrote:

> NANOG,
>
> This is a reminder that this experiment will resume tomorrow
> (Wednesday, Jan. 23rd). We will announce 184.164.224.0/24 carrying a
> BGP attribute of type 0xff (reserved for development) between 14:00
> and 14:15 GMT.
>
> On Tue, Dec 18, 2018 at 10:05 AM Italo Cunha  wrote:
> >
> > NANOG,
> >
> > We would like to inform you of an experiment to evaluate alternatives
> > for speeding up adoption of BGP route origin validation (research
> > paper with details [A]).
> >
> > Our plan is to announce prefix 184.164.224.0/24 with a valid
> > standards-compliant unassigned BGP attribute from routers operated by
> > the PEERING testbed [B, C]. The attribute will have flags 0xe0
> > (optional transitive [rfc4271, S4.3]), type 0xff (reserved for
> > development), and size 0x20 (256bits).
> >
> > Our collaborators recently ran an equivalent experiment with no
> > complaints or known issues [A], and so we do not anticipate any
> > arising. Back in 2010, an experiment using unassigned attributes by
> > RIPE and Duke University caused disruption in Internet routing due to
> > a bug in Cisco routers [D, CVE-2010-3035]. Since then, this and other
> > similar bugs have been patched [e.g., CVE-2013-6051], and new BGP
> > attributes have been assigned (BGPsec-path) and adopted (large
> > communities). We have successfully tested propagation of the
> > announcements on Cisco IOS-based routers running versions 12.2(33)SRA
> > and 15.3(1)S, Quagga 0.99.23.1 and 1.1.1, as well as BIRD 1.4.5 and
> > 1.6.3.
> >
> > We plan to announce 184.164.224.0/24 from 8 PEERING locations for a
> > predefined period of 15 minutes starting 14:30 GMT, from Monday to
> > Thursday, between the 7th and 22nd of January, 2019 (full schedule and
> > locations [E]). We will stop the experiment immediately in case any
> > issues arise.
> >
> > Although we do not expect the experiment to cause disruption, we
> > welcome feedback on its safety and especially on how to make it safer.
> > We can be reached at disco-experim...@googlegroups.com.
> >
> > Amir Herzberg, University of Connecticut
> > Ethan Katz-Bassett, Columbia University
> > Haya Shulman, Fraunhofer SIT
> > Ítalo Cunha, Universidade Federal de Minas Gerais
> > Michael Schapira, Hebrew University of Jerusalem
> > Tomas Hlavacek, Fraunhofer SIT
> > Yossi Gilad, MIT
> >
> > [A] https://conferences.sigcomm.org/hotnets/2018/program.html
> > [B] http://peering.usc.edu
> > [C] https://goo.gl/AFR1Cn
> > [D]
> https://labs.ripe.net/Members/erik/ripe-ncc-and-duke-university-bgp-experiment
> > [E] https://goo.gl/nJhmx1
>
-- 
Ben Cooper
Chief Executive Officer
PacketGG - Multicast
M(Telstra): 0410 411 301
M(Optus):  0434 336 743
E: b...@packet.gg & b...@multicast.net.au
W: https://packet.gg
W: https://multicast.net.au


Re: Global statistics during the experiment (was Re: BGP Experiment)

2019-01-24 Thread Töma Gavrichenkov
On Thu, Jan 24, 2019, 5:40 PM Mike Tancsa  wrote:

> On 1/23/2019 8:27 PM, Töma Gavrichenkov wrote:
> > +1. I've yet to see any disruptions caused by the experiment in my area.
> >
> Speaking of which, were there any statistics gathered and published
> before, during and after the experiment about the size of the global
> routing table and how many ASNs were impacted ?
>

No, sorry. Like I said, Radar has seen nothing more than few minor
interruptions during that period. If we detected anything serious, we would
have posted that to our blog.

I wonder if the experiment organizers have also collected any data.

--
Töma

>


Global statistics during the experiment (was Re: BGP Experiment)

2019-01-24 Thread Mike Tancsa
On 1/23/2019 8:27 PM, Töma Gavrichenkov wrote:
>
>> Replying to throw in my support behind continuing the experiment as well.
> +1. I've yet to see any disruptions caused by the experiment in my area.
>
Speaking of which, were there any statistics gathered and published
before, during and after the experiment about the size of the global
routing table and how many ASNs were impacted ?

    ---Mike



-- 
---
Mike Tancsa, tel +1 519 651 3400 x203
Sentex Communications, m...@sentex.net
Providing Internet services since 1994 www.sentex.net
Cambridge, Ontario Canada   



Re: BGP Experiment

2019-01-23 Thread William Herrin
On Wed, Jan 23, 2019 at 10:16 AM Filip Hruska  wrote:
> This experiment should be continued.
> It's the only way to get people to patch stuff.

Agreed. But be gentle. Wait a couple months between repeats to give
folks a generous amount of time to patch their gear. If you create the
emergency that a hacker could but hasn't, you've done the hacker's job
for him.

Regards,
Bill Herrin


-- 
William Herrin  her...@dirtside.com  b...@herrin.us
Dirtside Systems . Web: 


Re: BGP Experiment

2019-01-23 Thread Mark Tinka



On 23/Jan/19 21:01, James Jun wrote:

> Agreed;  Please resume the experiment.  We're all operators here, and we MUST 
> have confidence 
> that BGP speakers on our network are compliant with protocol specification.  
> Experiments like
> this are opportunities for a real-life validation of how our devices handle 
> messages that are
> out of the norm, and help us identify issues.
>
> Kudos to researchers by the way, for sending courtesy announcements in 
> advance, and testing 
> against some common platforms available to them (Cisco, Quagga & BIRD) prior 
> to the experiment.

+1.

Mark.


Re: BGP Experiment

2019-01-23 Thread Töma Gavrichenkov
On Thu, Jan 24, 2019 at 3:23 AM Paul S.  wrote:
> As others have noted, the right target to be angry at is your
> equipment vendor.

...whose name I'd personally be quite delighted to finally hear.
Is it just the same FRR that got a patch someone failed to apply, or
this time the issue is more serious?

> Replying to throw in my support behind continuing the experiment as well.

+1. I've yet to see any disruptions caused by the experiment in my area.

--
Töma


Re: BGP Experiment

2019-01-23 Thread Paul S.

Replying to throw in my support behind continuing the experiment as well.

Assurance that my gear will NOT fall over under adversarial situations 
is paramount, thank you for the research that you're doing to ensure that.


Ben, you may wish to re-evaluate how "rock solid" [1] your networking 
truly is if you're being taken down by random BGP updates.


As others have noted, the right target to be angry at is your equipment 
vendor.


[1]: https://packet.gg/

On 1/24/2019 02:19 午前, Italo Cunha wrote:

Ben, NANOG,

We have canceled this experiment permanently.

On Wed, Jan 23, 2019 at 12:00 PM Ben Cooper > wrote:


Can you stop this?

You caused again a massive prefix spike/flap, and as the internet
is not centered around NA (shock horror!) a number of operators in
Asia and Australia go effected by your “expirment” and had no idea
what was happening or why.

Get a sandbox like every other researcher, as of now we have black
holed and filtered your whole ASN, and have reccomended others do
the same.





Re: BGP Experiment

2019-01-23 Thread Owen DeLong



> On Jan 23, 2019, at 10:16 , Filip Hruska  wrote:
> 
> This experiment should be continued.
> 
> It's the only way to get people to patch stuff.
> And if all it takes to break things is a single announcement, than that's 
> something that should be definitely fixed.
> 
> Blacklisting an ASN is not a solution, that's ignorance.

Actually, at the point where you blacklist the ASN, you’ve moved from ignorance 
to willful ignorance (aka stupidity).

Owen



Re: BGP Experiment

2019-01-23 Thread Christoffer Hansen

On 23/01/2019 20:01, James Jun wrote:
> Kudos to researchers by the way, for sending courtesy announcements in 
> advance, and testing 
> against some common platforms available to them (Cisco, Quagga & BIRD) prior 
> to the experiment.
On 18/12/2018 16:05, Italo Cunha wrote:
> We have successfully tested propagation of the
> announcements on Cisco IOS-based routers running versions 12.2(33)SRA
> and 15.3(1)S, Quagga 0.99.23.1 and 1.1.1, as well as BIRD 1.4.5 and
> 1.6.3.

If tests was done _only_ Quagga, BIRD, Cisco IOS... I can only conclude
a limited set has been done. (But then again, $_vendors and $_versions
of $_software cannot be counted in small numbers.)



signature.asc
Description: OpenPGP digital signature


Re: BGP Experiment

2019-01-23 Thread James Jun
On Wed, Jan 23, 2019 at 06:45:50PM +, Nikolas Geyer wrote:
> Throwing my support behind continuing the experiment also. A singular 
> complaint from a company advertising unallocated ASN and IPv4 resources (the 
> irony) does not warrant cessation of the experiment.

Agreed;  Please resume the experiment.  We're all operators here, and we MUST 
have confidence 
that BGP speakers on our network are compliant with protocol specification.  
Experiments like
this are opportunities for a real-life validation of how our devices handle 
messages that are
out of the norm, and help us identify issues.

Kudos to researchers by the way, for sending courtesy announcements in advance, 
and testing 
against some common platforms available to them (Cisco, Quagga & BIRD) prior to 
the experiment.

James


Re: BGP Experiment

2019-01-23 Thread Christoffer Hansen

> On Wed, Jan 23, 2019 at 12:00 PM Ben Cooper  wrote:
> 
>> Can you stop this?
>>
>> You caused again a massive prefix spike/flap, and as the internet is not
>> centered around NA (shock horror!) a number of operators in Asia and
>> Australia go effected by your “expirment” and had no idea what was
>> happening or why.

You could(?) be helpful by propagating the announcement to other $lists
or point out to the authors they are not spreading their announcements
about the experiment wide enough.

>> Get a sandbox like every other researcher, as of now we have black holed
>> and filtered your whole ASN, and have reccomended others do the same.
Probably the previous sent announcement about the prefix announcements
sent out today. Should have gone to more mailing lists at RIRs and NOGs.
(Encountered a FRR user today, too. Who had missed previous announcements)

-Christoffer




signature.asc
Description: OpenPGP digital signature


Re: BGP Experiment

2019-01-23 Thread Nikolas Geyer
Throwing my support behind continuing the experiment also. A singular complaint 
from a company advertising unallocated ASN and IPv4 resources (the irony) does 
not warrant cessation of the experiment.

The experiment is in compliance with the relevant RFCs, the affected “vendor” 
has released fixed software and announced it to their notifications list. I can 
only hope this and future research continues.

> On Jan 23, 2019, at 1:39 PM, Christoffer Hansen  
> wrote:
> 
> 
>> On 23/01/2019 18:19, Italo Cunha wrote:
>> We have canceled this experiment permanently.
> 
> Sad to hear! :/
> 
> My impression if you continue is more users will get aware of bugs with
> running BGP implementations. Not all follow announcement from $vendor
> and will continue to run older broken code and complain to $vendor about
> errors with the software.


Re: BGP Experiment

2019-01-23 Thread Christoffer Hansen


On 23/01/2019 18:19, Italo Cunha wrote:
> We have canceled this experiment permanently.

Sad to hear! :/

My impression if you continue is more users will get aware of bugs with
running BGP implementations. Not all follow announcement from $vendor
and will continue to run older broken code and complain to $vendor about
errors with the software.


RE: BGP Experiment

2019-01-23 Thread Naslund, Steve
Sorry.  Correction.

If it IS RFC compliant they should accept the attribute.  If it is NOT, they 
should drop (and maybe log it).

Steve

>Contact your hardware vendor.  That is not acceptable behavior.  If it is not 
>RFC compliant they need to accept the attribute, if it's not RFC compliant 
>they should gracefully >ignore it.  Now we all know that anyone using that 
>gear is vulnerable to a DoS attack.  Won't be long until anyone else sends 
>that to you.

>Steven Naslund
>Chicago IL 



RE: BGP Experiment

2019-01-23 Thread Naslund, Steve
Agreed, do you think you will not see that attribute again now that the public 
knows that you are vulnerable to this DoS method.  Expect to see an attack 
based on this method shortly.  They just did you a favor by exposing your 
vulnerability, you should take it as such.  I would be putting in emergency 
patches tonight if available.

Steven Naslund
Chicago IL
>This experiment should be continued.
>
>It's the only way to get people to patch stuff.
>And if all it takes to break things is a single announcement, than that's 
>something that should be definitely fixed.
>
>Blacklisting an ASN is not a solution, that's ignorance.
>
>Regards,
>Filip Hruska


RE: BGP Experiment

2019-01-23 Thread Naslund, Steve
Contact your hardware vendor.  That is not acceptable behavior.  If it is not 
RFC compliant they need to accept the attribute, if it's not RFC compliant they 
should gracefully ignore it.  Now we all know that anyone using that gear is 
vulnerable to a DoS attack.  Won't be long until anyone else sends that to you.

Steven Naslund
Chicago IL 

>Well, here, when you receive this particular attribute and if you're 
>vulnerable, your equipment automatically gets disconnected from the Internet, 
>so the issue kinda solves >itself.
>
>--
>Töma


Re: BGP Experiment

2019-01-23 Thread Filip Hruska
This experiment should be continued.

It's the only way to get people to patch stuff.
And if all it takes to break things is a single announcement, than that's 
something that should be definitely fixed.

Blacklisting an ASN is not a solution, that's ignorance.

Regards,
Filip Hruska

On 23 January 2019 18:19:09 CET, Italo Cunha  wrote:
>Ben, NANOG,
>
>We have canceled this experiment permanently.
>
>On Wed, Jan 23, 2019 at 12:00 PM Ben Cooper  wrote:
>
>> Can you stop this?
>>
>> You caused again a massive prefix spike/flap, and as the internet is
>not
>> centered around NA (shock horror!) a number of operators in Asia and
>> Australia go effected by your “expirment” and had no idea what was
>> happening or why.
>>
>> Get a sandbox like every other researcher, as of now we have black
>holed
>> and filtered your whole ASN, and have reccomended others do the same.
>>
>> On Wed, 23 Jan 2019 at 1:19 am, Italo Cunha 
>wrote:
>>
>>> NANOG,
>>>
>>> This is a reminder that this experiment will resume tomorrow
>>> (Wednesday, Jan. 23rd). We will announce 184.164.224.0/24 carrying a
>>> BGP attribute of type 0xff (reserved for development) between 14:00
>>> and 14:15 GMT.
>>>
>>> On Tue, Dec 18, 2018 at 10:05 AM Italo Cunha 
>wrote:
>>> >
>>> > NANOG,
>>> >
>>> > We would like to inform you of an experiment to evaluate
>alternatives
>>> > for speeding up adoption of BGP route origin validation (research
>>> > paper with details [A]).
>>> >
>>> > Our plan is to announce prefix 184.164.224.0/24 with a valid
>>> > standards-compliant unassigned BGP attribute from routers operated
>by
>>> > the PEERING testbed [B, C]. The attribute will have flags 0xe0
>>> > (optional transitive [rfc4271, S4.3]), type 0xff (reserved for
>>> > development), and size 0x20 (256bits).
>>> >
>>> > Our collaborators recently ran an equivalent experiment with no
>>> > complaints or known issues [A], and so we do not anticipate any
>>> > arising. Back in 2010, an experiment using unassigned attributes
>by
>>> > RIPE and Duke University caused disruption in Internet routing due
>to
>>> > a bug in Cisco routers [D, CVE-2010-3035]. Since then, this and
>other
>>> > similar bugs have been patched [e.g., CVE-2013-6051], and new BGP
>>> > attributes have been assigned (BGPsec-path) and adopted (large
>>> > communities). We have successfully tested propagation of the
>>> > announcements on Cisco IOS-based routers running versions
>12.2(33)SRA
>>> > and 15.3(1)S, Quagga 0.99.23.1 and 1.1.1, as well as BIRD 1.4.5
>and
>>> > 1.6.3.
>>> >
>>> > We plan to announce 184.164.224.0/24 from 8 PEERING locations for
>a
>>> > predefined period of 15 minutes starting 14:30 GMT, from Monday to
>>> > Thursday, between the 7th and 22nd of January, 2019 (full schedule
>and
>>> > locations [E]). We will stop the experiment immediately in case
>any
>>> > issues arise.
>>> >
>>> > Although we do not expect the experiment to cause disruption, we
>>> > welcome feedback on its safety and especially on how to make it
>safer.
>>> > We can be reached at disco-experim...@googlegroups.com.
>>> >
>>> > Amir Herzberg, University of Connecticut
>>> > Ethan Katz-Bassett, Columbia University
>>> > Haya Shulman, Fraunhofer SIT
>>> > Ítalo Cunha, Universidade Federal de Minas Gerais
>>> > Michael Schapira, Hebrew University of Jerusalem
>>> > Tomas Hlavacek, Fraunhofer SIT
>>> > Yossi Gilad, MIT
>>> >
>>> > [A] https://conferences.sigcomm.org/hotnets/2018/program.html
>>> > [B] http://peering.usc.edu
>>> > [C] https://goo.gl/AFR1Cn
>>> > [D]
>>>
>https://labs.ripe.net/Members/erik/ripe-ncc-and-duke-university-bgp-experiment
>>> > [E] https://goo.gl/nJhmx1
>>>
>> --
>> Ben Cooper
>> Chief Executive Officer
>> PacketGG - Multicast
>> M(Telstra): 0410 411 301
>> M(Optus):  0434 336 743
>> E: b...@packet.gg & b...@multicast.net.au
>> W: https://packet.gg
>> W: https://multicast.net.au
>>
>> --
>> You received this message because you are subscribed to the Google
>Groups
>> "DISCO Experiment" group.
>> To unsubscribe from this group and stop receiving emails from it,
>send an
>> email to disco-experiment+unsubscr...@googlegroups.com.
>> To post to this group, send email to
>disco-experim...@googlegroups.com.
>> To view this discussion on the web visit
>>
>https://groups.google.com/d/msgid/disco-experiment/CAPZQKs8aVT%3D7gJdGcoC-KOPDR0F4Ms33KAKKG5-4k96SVCSFEw%40mail.gmail.com
>>
>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Re: BGP Experiment

2019-01-23 Thread Nick Hilliard

Töma Gavrichenkov wrote on 23/01/2019 18:00:

What if next time e.g. the bad guys would be doing this? Would you
urge them to also get a sandbox?


Send them a strongly worded memo.

If that doesn't work, threaten to send them a second.

Nick


Re: BGP Experiment

2019-01-23 Thread Töma Gavrichenkov
On Wed, Jan 23, 2019 at 9:05 PM Aled Morris via NANOG  wrote:
> I'd go further and say that as long as you're connected
> to the Internet, your equipment better be resilient when
> receiving packets with any combination of bits set,
> RFC compliant or not.

Well, here, when you receive this particular attribute and if you're
vulnerable, your equipment automatically gets disconnected from the
Internet, so the issue kinda solves itself.

--
Töma


Re: BGP Experiment

2019-01-23 Thread Aled Morris via NANOG
On Wed, 23 Jan 2019 at 17:58, Naslund, Steve  wrote:

> I hope you are as critical of your hardware vendor that cannot accept BGP4
> compliant attributes or have you just not updated your code?  You can black
> hole anything you want but as long as the “Internet” is sending you an RFC
> compliant BGP you better be able to handle it.
>


I'd go further and say that as long as you're connected to the Internet,
your equipment better be resilient when receiving packets with any
combination of bits set, RFC compliant or not.

Aled


Re: BGP Experiment

2019-01-23 Thread Töma Gavrichenkov
> On Wed, Jan 23, 2019 at 12:00 PM Ben Cooper  wrote:
> You caused again a massive prefix spike/flap, and as the internet is not 
> centered around NA (shock horror!) a number of operators in Asia and 
> Australia go effected by your “expirment” and had no idea what was happening 
> or why.
>
> Get a sandbox like every other researcher

The whole thing reminds me of a decades old story which can be found
on Google by a search term "The Spider of Doom".

What if next time e.g. the bad guys would be doing this? Would you
urge them to also get a sandbox?

--
Töma


RE: BGP Experiment

2019-01-23 Thread Naslund, Steve
I hope you are as critical of your hardware vendor that cannot accept BGP4 
compliant attributes or have you just not updated your code?  You can black 
hole anything you want but as long as the “Internet” is sending you an RFC 
compliant BGP you better be able to handle it.

Steven Naslund
Chicago IL

On Wed, Jan 23, 2019 at 12:00 PM Ben Cooper 
mailto:b...@packet.gg>> wrote:
Can you stop this?

You caused again a massive prefix spike/flap, and as the internet is not 
centered around NA (shock horror!) a number of operators in Asia and Australia 
go effected by your “expirment” and had no idea what was happening or why.

Get a sandbox like every other researcher, as of now we have black holed and 
filtered your whole ASN, and have reccomended others do the same.




Re: BGP Experiment

2019-01-23 Thread Eric Kuhnke
I would be very interested in hearing Ben's definition of something that is
"massive", if announcing or withdrawing a single /24 from the global
routing table constitutes, quote, "a massive prefix spike/flap".

Individual /24s are moved around all the time by fully automated systems.


On Wed, Jan 23, 2019 at 9:42 AM Job Snijders  wrote:

> Dear Ben, all,
>
> I'm not sure this experiment should be canceled. On the public Internet
> we MUST assume BGP speakers are compliant with the BGP-4 protocol.
> Broken BGP-4 speakers are what they are: broken. They must be fixed, or
> the operator must accept the consequences.
>
> "Get a sandbox like every other researcher" is not a fair statement, one
> can also posit "Get a compliant BGP-4 implementation like every other
> network operator".
>
> When bad guys explicitly seek to target these Asian and Australian
> operators you reference (who apparently have not upgraded to the vendor
> recommended release), using *valid* BGP updates, will a politely emailed
> request help resolve the situation? Of course not!
>
> Stopping the experiment is only treating symptoms, the root cause must
> be addressed: broken software.
>
> Kind regards,
>
> Job
>
> On Wed, Jan 23, 2019 at 12:19:09PM -0500, Italo Cunha wrote:
> > Ben, NANOG,
> >
> > We have canceled this experiment permanently.
> >
> > On Wed, Jan 23, 2019 at 12:00 PM Ben Cooper  wrote:
> >
> > > Can you stop this?
> > >
> > > You caused again a massive prefix spike/flap, and as the internet is
> not
> > > centered around NA (shock horror!) a number of operators in Asia and
> > > Australia go effected by your “expirment” and had no idea what was
> > > happening or why.
> > >
> > > Get a sandbox like every other researcher, as of now we have black
> holed
> > > and filtered your whole ASN, and have reccomended others do the same.
> > >
> > > On Wed, 23 Jan 2019 at 1:19 am, Italo Cunha  wrote:
> > >
> > >> NANOG,
> > >>
> > >> This is a reminder that this experiment will resume tomorrow
> > >> (Wednesday, Jan. 23rd). We will announce 184.164.224.0/24 carrying a
> > >> BGP attribute of type 0xff (reserved for development) between 14:00
> > >> and 14:15 GMT.
> > >>
> > >> On Tue, Dec 18, 2018 at 10:05 AM Italo Cunha 
> wrote:
> > >> >
> > >> > NANOG,
> > >> >
> > >> > We would like to inform you of an experiment to evaluate
> alternatives
> > >> > for speeding up adoption of BGP route origin validation (research
> > >> > paper with details [A]).
> > >> >
> > >> > Our plan is to announce prefix 184.164.224.0/24 with a valid
> > >> > standards-compliant unassigned BGP attribute from routers operated
> by
> > >> > the PEERING testbed [B, C]. The attribute will have flags 0xe0
> > >> > (optional transitive [rfc4271, S4.3]), type 0xff (reserved for
> > >> > development), and size 0x20 (256bits).
> > >> >
> > >> > Our collaborators recently ran an equivalent experiment with no
> > >> > complaints or known issues [A], and so we do not anticipate any
> > >> > arising. Back in 2010, an experiment using unassigned attributes by
> > >> > RIPE and Duke University caused disruption in Internet routing due
> to
> > >> > a bug in Cisco routers [D, CVE-2010-3035]. Since then, this and
> other
> > >> > similar bugs have been patched [e.g., CVE-2013-6051], and new BGP
> > >> > attributes have been assigned (BGPsec-path) and adopted (large
> > >> > communities). We have successfully tested propagation of the
> > >> > announcements on Cisco IOS-based routers running versions
> 12.2(33)SRA
> > >> > and 15.3(1)S, Quagga 0.99.23.1 and 1.1.1, as well as BIRD 1.4.5 and
> > >> > 1.6.3.
> > >> >
> > >> > We plan to announce 184.164.224.0/24 from 8 PEERING locations for a
> > >> > predefined period of 15 minutes starting 14:30 GMT, from Monday to
> > >> > Thursday, between the 7th and 22nd of January, 2019 (full schedule
> and
> > >> > locations [E]). We will stop the experiment immediately in case any
> > >> > issues arise.
> > >> >
> > >> > Although we do not expect the experiment to cause disruption, we
> > >> > welcome feedback on its safety and especially on how to make it
> safer.
> > >> > We can be reached at disco-experim...@googlegroups.com.
> > >> >
> > >> > Amir Herzberg, University of Connecticut
> > >> > Ethan Katz-Bassett, Columbia University
> > >> > Haya Shulman, Fraunhofer SIT
> > >> > Ítalo Cunha, Universidade Federal de Minas Gerais
> > >> > Michael Schapira, Hebrew University of Jerusalem
> > >> > Tomas Hlavacek, Fraunhofer SIT
> > >> > Yossi Gilad, MIT
> > >> >
> > >> > [A] https://conferences.sigcomm.org/hotnets/2018/program.html
> > >> > [B] http://peering.usc.edu
> > >> > [C] https://goo.gl/AFR1Cn
> > >> > [D]
> > >>
> https://labs.ripe.net/Members/erik/ripe-ncc-and-duke-university-bgp-experiment
> > >> > [E] https://goo.gl/nJhmx1
> > >>
> > > --
> > > Ben Cooper
> > > Chief Executive Officer
> > > PacketGG - Multicast
> > > M(Telstra): 0410 411 301
> > > M(Optus):  0434 336 743
> > > E: b...@packet.gg & b...@multicast.net.au
> 

Re: BGP Experiment

2019-01-23 Thread Job Snijders
Dear Ben, all,

I'm not sure this experiment should be canceled. On the public Internet
we MUST assume BGP speakers are compliant with the BGP-4 protocol.
Broken BGP-4 speakers are what they are: broken. They must be fixed, or
the operator must accept the consequences.

"Get a sandbox like every other researcher" is not a fair statement, one
can also posit "Get a compliant BGP-4 implementation like every other
network operator".

When bad guys explicitly seek to target these Asian and Australian
operators you reference (who apparently have not upgraded to the vendor
recommended release), using *valid* BGP updates, will a politely emailed
request help resolve the situation? Of course not!

Stopping the experiment is only treating symptoms, the root cause must
be addressed: broken software.

Kind regards,

Job

On Wed, Jan 23, 2019 at 12:19:09PM -0500, Italo Cunha wrote:
> Ben, NANOG,
> 
> We have canceled this experiment permanently.
> 
> On Wed, Jan 23, 2019 at 12:00 PM Ben Cooper  wrote:
> 
> > Can you stop this?
> >
> > You caused again a massive prefix spike/flap, and as the internet is not
> > centered around NA (shock horror!) a number of operators in Asia and
> > Australia go effected by your “expirment” and had no idea what was
> > happening or why.
> >
> > Get a sandbox like every other researcher, as of now we have black holed
> > and filtered your whole ASN, and have reccomended others do the same.
> >
> > On Wed, 23 Jan 2019 at 1:19 am, Italo Cunha  wrote:
> >
> >> NANOG,
> >>
> >> This is a reminder that this experiment will resume tomorrow
> >> (Wednesday, Jan. 23rd). We will announce 184.164.224.0/24 carrying a
> >> BGP attribute of type 0xff (reserved for development) between 14:00
> >> and 14:15 GMT.
> >>
> >> On Tue, Dec 18, 2018 at 10:05 AM Italo Cunha  wrote:
> >> >
> >> > NANOG,
> >> >
> >> > We would like to inform you of an experiment to evaluate alternatives
> >> > for speeding up adoption of BGP route origin validation (research
> >> > paper with details [A]).
> >> >
> >> > Our plan is to announce prefix 184.164.224.0/24 with a valid
> >> > standards-compliant unassigned BGP attribute from routers operated by
> >> > the PEERING testbed [B, C]. The attribute will have flags 0xe0
> >> > (optional transitive [rfc4271, S4.3]), type 0xff (reserved for
> >> > development), and size 0x20 (256bits).
> >> >
> >> > Our collaborators recently ran an equivalent experiment with no
> >> > complaints or known issues [A], and so we do not anticipate any
> >> > arising. Back in 2010, an experiment using unassigned attributes by
> >> > RIPE and Duke University caused disruption in Internet routing due to
> >> > a bug in Cisco routers [D, CVE-2010-3035]. Since then, this and other
> >> > similar bugs have been patched [e.g., CVE-2013-6051], and new BGP
> >> > attributes have been assigned (BGPsec-path) and adopted (large
> >> > communities). We have successfully tested propagation of the
> >> > announcements on Cisco IOS-based routers running versions 12.2(33)SRA
> >> > and 15.3(1)S, Quagga 0.99.23.1 and 1.1.1, as well as BIRD 1.4.5 and
> >> > 1.6.3.
> >> >
> >> > We plan to announce 184.164.224.0/24 from 8 PEERING locations for a
> >> > predefined period of 15 minutes starting 14:30 GMT, from Monday to
> >> > Thursday, between the 7th and 22nd of January, 2019 (full schedule and
> >> > locations [E]). We will stop the experiment immediately in case any
> >> > issues arise.
> >> >
> >> > Although we do not expect the experiment to cause disruption, we
> >> > welcome feedback on its safety and especially on how to make it safer.
> >> > We can be reached at disco-experim...@googlegroups.com.
> >> >
> >> > Amir Herzberg, University of Connecticut
> >> > Ethan Katz-Bassett, Columbia University
> >> > Haya Shulman, Fraunhofer SIT
> >> > Ítalo Cunha, Universidade Federal de Minas Gerais
> >> > Michael Schapira, Hebrew University of Jerusalem
> >> > Tomas Hlavacek, Fraunhofer SIT
> >> > Yossi Gilad, MIT
> >> >
> >> > [A] https://conferences.sigcomm.org/hotnets/2018/program.html
> >> > [B] http://peering.usc.edu
> >> > [C] https://goo.gl/AFR1Cn
> >> > [D]
> >> https://labs.ripe.net/Members/erik/ripe-ncc-and-duke-university-bgp-experiment
> >> > [E] https://goo.gl/nJhmx1
> >>
> > --
> > Ben Cooper
> > Chief Executive Officer
> > PacketGG - Multicast
> > M(Telstra): 0410 411 301
> > M(Optus):  0434 336 743
> > E: b...@packet.gg & b...@multicast.net.au
> > W: https://packet.gg
> > W: https://multicast.net.au
> >
> > --
> > You received this message because you are subscribed to the Google Groups
> > "DISCO Experiment" group.
> > To unsubscribe from this group and stop receiving emails from it, send an
> > email to disco-experiment+unsubscr...@googlegroups.com.
> > To post to this group, send email to disco-experim...@googlegroups.com.
> > To view this discussion on the web visit
> > https://groups.google.com/d/msgid/disco-experiment/CAPZQKs8aVT%3D7gJdGcoC-KOPDR0F4Ms33KAKKG5-4k96SVCSFEw%40mail.gmail.com
> > 

Re: BGP Experiment

2019-01-23 Thread Italo Cunha
Ben, NANOG,

We have canceled this experiment permanently.

On Wed, Jan 23, 2019 at 12:00 PM Ben Cooper  wrote:

> Can you stop this?
>
> You caused again a massive prefix spike/flap, and as the internet is not
> centered around NA (shock horror!) a number of operators in Asia and
> Australia go effected by your “expirment” and had no idea what was
> happening or why.
>
> Get a sandbox like every other researcher, as of now we have black holed
> and filtered your whole ASN, and have reccomended others do the same.
>
> On Wed, 23 Jan 2019 at 1:19 am, Italo Cunha  wrote:
>
>> NANOG,
>>
>> This is a reminder that this experiment will resume tomorrow
>> (Wednesday, Jan. 23rd). We will announce 184.164.224.0/24 carrying a
>> BGP attribute of type 0xff (reserved for development) between 14:00
>> and 14:15 GMT.
>>
>> On Tue, Dec 18, 2018 at 10:05 AM Italo Cunha  wrote:
>> >
>> > NANOG,
>> >
>> > We would like to inform you of an experiment to evaluate alternatives
>> > for speeding up adoption of BGP route origin validation (research
>> > paper with details [A]).
>> >
>> > Our plan is to announce prefix 184.164.224.0/24 with a valid
>> > standards-compliant unassigned BGP attribute from routers operated by
>> > the PEERING testbed [B, C]. The attribute will have flags 0xe0
>> > (optional transitive [rfc4271, S4.3]), type 0xff (reserved for
>> > development), and size 0x20 (256bits).
>> >
>> > Our collaborators recently ran an equivalent experiment with no
>> > complaints or known issues [A], and so we do not anticipate any
>> > arising. Back in 2010, an experiment using unassigned attributes by
>> > RIPE and Duke University caused disruption in Internet routing due to
>> > a bug in Cisco routers [D, CVE-2010-3035]. Since then, this and other
>> > similar bugs have been patched [e.g., CVE-2013-6051], and new BGP
>> > attributes have been assigned (BGPsec-path) and adopted (large
>> > communities). We have successfully tested propagation of the
>> > announcements on Cisco IOS-based routers running versions 12.2(33)SRA
>> > and 15.3(1)S, Quagga 0.99.23.1 and 1.1.1, as well as BIRD 1.4.5 and
>> > 1.6.3.
>> >
>> > We plan to announce 184.164.224.0/24 from 8 PEERING locations for a
>> > predefined period of 15 minutes starting 14:30 GMT, from Monday to
>> > Thursday, between the 7th and 22nd of January, 2019 (full schedule and
>> > locations [E]). We will stop the experiment immediately in case any
>> > issues arise.
>> >
>> > Although we do not expect the experiment to cause disruption, we
>> > welcome feedback on its safety and especially on how to make it safer.
>> > We can be reached at disco-experim...@googlegroups.com.
>> >
>> > Amir Herzberg, University of Connecticut
>> > Ethan Katz-Bassett, Columbia University
>> > Haya Shulman, Fraunhofer SIT
>> > Ítalo Cunha, Universidade Federal de Minas Gerais
>> > Michael Schapira, Hebrew University of Jerusalem
>> > Tomas Hlavacek, Fraunhofer SIT
>> > Yossi Gilad, MIT
>> >
>> > [A] https://conferences.sigcomm.org/hotnets/2018/program.html
>> > [B] http://peering.usc.edu
>> > [C] https://goo.gl/AFR1Cn
>> > [D]
>> https://labs.ripe.net/Members/erik/ripe-ncc-and-duke-university-bgp-experiment
>> > [E] https://goo.gl/nJhmx1
>>
> --
> Ben Cooper
> Chief Executive Officer
> PacketGG - Multicast
> M(Telstra): 0410 411 301
> M(Optus):  0434 336 743
> E: b...@packet.gg & b...@multicast.net.au
> W: https://packet.gg
> W: https://multicast.net.au
>
> --
> You received this message because you are subscribed to the Google Groups
> "DISCO Experiment" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to disco-experiment+unsubscr...@googlegroups.com.
> To post to this group, send email to disco-experim...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/disco-experiment/CAPZQKs8aVT%3D7gJdGcoC-KOPDR0F4Ms33KAKKG5-4k96SVCSFEw%40mail.gmail.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>


Re: BGP Experiment

2019-01-22 Thread Italo Cunha
NANOG,

This is a reminder that this experiment will resume tomorrow
(Wednesday, Jan. 23rd). We will announce 184.164.224.0/24 carrying a
BGP attribute of type 0xff (reserved for development) between 14:00
and 14:15 GMT.

On Tue, Dec 18, 2018 at 10:05 AM Italo Cunha  wrote:
>
> NANOG,
>
> We would like to inform you of an experiment to evaluate alternatives
> for speeding up adoption of BGP route origin validation (research
> paper with details [A]).
>
> Our plan is to announce prefix 184.164.224.0/24 with a valid
> standards-compliant unassigned BGP attribute from routers operated by
> the PEERING testbed [B, C]. The attribute will have flags 0xe0
> (optional transitive [rfc4271, S4.3]), type 0xff (reserved for
> development), and size 0x20 (256bits).
>
> Our collaborators recently ran an equivalent experiment with no
> complaints or known issues [A], and so we do not anticipate any
> arising. Back in 2010, an experiment using unassigned attributes by
> RIPE and Duke University caused disruption in Internet routing due to
> a bug in Cisco routers [D, CVE-2010-3035]. Since then, this and other
> similar bugs have been patched [e.g., CVE-2013-6051], and new BGP
> attributes have been assigned (BGPsec-path) and adopted (large
> communities). We have successfully tested propagation of the
> announcements on Cisco IOS-based routers running versions 12.2(33)SRA
> and 15.3(1)S, Quagga 0.99.23.1 and 1.1.1, as well as BIRD 1.4.5 and
> 1.6.3.
>
> We plan to announce 184.164.224.0/24 from 8 PEERING locations for a
> predefined period of 15 minutes starting 14:30 GMT, from Monday to
> Thursday, between the 7th and 22nd of January, 2019 (full schedule and
> locations [E]). We will stop the experiment immediately in case any
> issues arise.
>
> Although we do not expect the experiment to cause disruption, we
> welcome feedback on its safety and especially on how to make it safer.
> We can be reached at disco-experim...@googlegroups.com.
>
> Amir Herzberg, University of Connecticut
> Ethan Katz-Bassett, Columbia University
> Haya Shulman, Fraunhofer SIT
> Ítalo Cunha, Universidade Federal de Minas Gerais
> Michael Schapira, Hebrew University of Jerusalem
> Tomas Hlavacek, Fraunhofer SIT
> Yossi Gilad, MIT
>
> [A] https://conferences.sigcomm.org/hotnets/2018/program.html
> [B] http://peering.usc.edu
> [C] https://goo.gl/AFR1Cn
> [D] 
> https://labs.ripe.net/Members/erik/ripe-ncc-and-duke-university-bgp-experiment
> [E] https://goo.gl/nJhmx1


Re: BGP Experiment

2019-01-10 Thread Italo Cunha
NANOG,

The FRR devs have released binary packages including the fix and
announced it on the FRR mailing lists.  After considering the feedback
on the list and discussing with FRR devs, we will postpone the
experiments until Jan. 23rd, and have updated the schedule to reflect
the delayed start and shorter timeline [A].  We will follow up with
FRR devs and mailing lists/users.

[A] https://goo.gl/nJhmx1

On Tue, Jan 8, 2019 at 11:41 AM Italo Cunha  wrote:
>
> NANOG,
>
> We've performed the first announcement in this experiment yesterday,
> and, despite the announcement being compliant with BGP standards, FRR
> routers reset their sessions upon receiving it.  Upon notice of the
> problem, we halted the experiments.  The FRR developers confirmed that
> this issue is specific to an unintended consequence of how FRR handles
> the attribute 0xFF (reserved for development) we used.  The FRR devs
> already merged a fix and notified users.
>
> We plan to resume the experiments January 16th (next Wednesday), and
> have updated the experiment schedule [A] accordingly.  As always, we
> welcome your feedback.
>
> [A] https://goo.gl/nJhmx1
>
> On Tue, Dec 18, 2018 at 10:05 AM Italo Cunha  wrote:
> >
> > NANOG,
> >
> > We would like to inform you of an experiment to evaluate alternatives
> > for speeding up adoption of BGP route origin validation (research
> > paper with details [A]).
> >
> > Our plan is to announce prefix 184.164.224.0/24 with a valid
> > standards-compliant unassigned BGP attribute from routers operated by
> > the PEERING testbed [B, C]. The attribute will have flags 0xe0
> > (optional transitive [rfc4271, S4.3]), type 0xff (reserved for
> > development), and size 0x20 (256bits).
> >
> > Our collaborators recently ran an equivalent experiment with no
> > complaints or known issues [A], and so we do not anticipate any
> > arising. Back in 2010, an experiment using unassigned attributes by
> > RIPE and Duke University caused disruption in Internet routing due to
> > a bug in Cisco routers [D, CVE-2010-3035]. Since then, this and other
> > similar bugs have been patched [e.g., CVE-2013-6051], and new BGP
> > attributes have been assigned (BGPsec-path) and adopted (large
> > communities). We have successfully tested propagation of the
> > announcements on Cisco IOS-based routers running versions 12.2(33)SRA
> > and 15.3(1)S, Quagga 0.99.23.1 and 1.1.1, as well as BIRD 1.4.5 and
> > 1.6.3.
> >
> > We plan to announce 184.164.224.0/24 from 8 PEERING locations for a
> > predefined period of 15 minutes starting 14:30 GMT, from Monday to
> > Thursday, between the 7th and 22nd of January, 2019 (full schedule and
> > locations [E]). We will stop the experiment immediately in case any
> > issues arise.
> >
> > Although we do not expect the experiment to cause disruption, we
> > welcome feedback on its safety and especially on how to make it safer.
> > We can be reached at disco-experim...@googlegroups.com.
> >
> > Amir Herzberg, University of Connecticut
> > Ethan Katz-Bassett, Columbia University
> > Haya Shulman, Fraunhofer SIT
> > Ítalo Cunha, Universidade Federal de Minas Gerais
> > Michael Schapira, Hebrew University of Jerusalem
> > Tomas Hlavacek, Fraunhofer SIT
> > Yossi Gilad, MIT
> >
> > [A] https://conferences.sigcomm.org/hotnets/2018/program.html
> > [B] http://peering.usc.edu
> > [C] https://goo.gl/AFR1Cn
> > [D] 
> > https://labs.ripe.net/Members/erik/ripe-ncc-and-duke-university-bgp-experiment
> > [E] https://goo.gl/nJhmx1


Re: BGP Experiment

2019-01-09 Thread Töma Gavrichenkov
Is that a competition in sarcasm? Because I can do better than that!

10 Jan. 2019 г., 2:41 :

> > Töma Gavrichenkov
> > Sent: Wednesday, January 9, 2019 7:08 PM
> >
> > On Wed, Jan 9, 2019 at 10:03 PM Saku Ytti  wrote:
> > > Finding forwarding issues indeed is harder due to the limited access
> > > to devices, so bit of security through obscurity I guess.
> >
> > Or, rather, security by complexity. Today's network infrastructure is
> complex
> > enough for people to dive into it, looking for all the underlying
> issues. Right, it
> > still saves us the day, though today's Web JS frontend is also quite
> complex
> > but it's of no help.
> >
> I don't know about that,
> All modern NPUs are based on the models well explained to the sufficient
> level in Stanford lectures and other materials which you can download
> freely.
> Once you learn how an ideal routing system architecture should look like
> you can start discovering flaws in the vendor NPU blueprints.
> But the best fun is to put IXIA or Spirent in loop with your favourite
> carrier router.
> What I'm trying to say is it's not that complex to find these routing
> system architecture shortcomings  - they will come out of the basic
> platform testing.
>
> adam
>
>


RE: BGP Experiment

2019-01-09 Thread adamv0025
> Töma Gavrichenkov
> Sent: Wednesday, January 9, 2019 7:08 PM
> 
> On Wed, Jan 9, 2019 at 10:03 PM Saku Ytti  wrote:
> > Finding forwarding issues indeed is harder due to the limited access
> > to devices, so bit of security through obscurity I guess.
> 
> Or, rather, security by complexity. Today's network infrastructure is complex
> enough for people to dive into it, looking for all the underlying issues. 
> Right, it
> still saves us the day, though today's Web JS frontend is also quite complex
> but it's of no help.
> 
I don't know about that,
All modern NPUs are based on the models well explained to the sufficient level 
in Stanford lectures and other materials which you can download freely. 
Once you learn how an ideal routing system architecture should look like you 
can start discovering flaws in the vendor NPU blueprints. 
But the best fun is to put IXIA or Spirent in loop with your favourite carrier 
router. 
What I'm trying to say is it's not that complex to find these routing system 
architecture shortcomings  - they will come out of the basic platform testing. 

adam   



Re: BGP Experiment

2019-01-09 Thread Töma Gavrichenkov
On Wed, Jan 9, 2019 at 10:33 PM Owen DeLong  wrote:
> Fair enough, but the frequency of vulnerability announcements
> even in some of the best implementations is still more often than
> I think my customers will tolerated reboots.

Well, and when I think about it for the second time, I can't help
pointing out that there are long lived efforts from OS developers to
come up with live patching, especially embedded and RTOS developers.

As the recent you-know-which downtime has shown us, there are
Internet-based services like 911 telephony which really start to treat
Internet as a whole as a real time system.  The question here is
whether this encourages e.g. the aforementioned FRR developers (along
with device vendors who actually get paid for the uninterruptible BGP
availability) to accept this challenge.

--
Töma


Re: BGP Experiment

2019-01-09 Thread Töma Gavrichenkov
On Wed, Jan 9, 2019 at 10:33 PM Owen DeLong  wrote:
> At the end of the day, this is really about risk analysis
> and it helps to put things into 1 of 4 risk quadrants
> based on two axes… Axis 1 is the likelihood of the
> vulnerability being exploited, while axis 2 is the
> severity of the cost/consequences of exploitation.
>
> Obviously something that scores high on both axes
> will have me rolling out the upgrades as rapidly as
> possible, likely within 24 hours to at least the
> majority of the network.

Good for you (not kidding).  Not quite the same on average, as far as I can see.

> The other two quadrants are a grey area that
> becomes more of a judgment call where other
> factors specific to each operator and their
> customer profile will come into play.
> Some operators may have a high tolerance
> for high-probability low-cost problem, while
> others may find this very urgent, for example.

I agree with you; however, it's the other quadrant (high cost,
seemingly low probability) which is a real gray area IMO which allows
for collateral damage at a Hollywood blockbuster scale.

--
Töma


Re: BGP Experiment

2019-01-09 Thread Owen DeLong



> On Jan 9, 2019, at 10:51 , Saku Ytti  wrote:
> 
> On Wed, 9 Jan 2019 at 20:45, Töma Gavrichenkov  wrote:
> 
>> Nope, this is a misunderstanding. One has to *check* for advisories at
>> least once or twice a week and only update (and reboot is necessary)
>> if there *is* a vulnerability.
> 
> I think this contains some assumptions
> 
> 1. discovering security issues in network devices is expensive (and
> thus only those you glean from vendor notices realistically exist)

Not really… I think the assumption here is that you can’t resolve an issue 
until the vendor publishes the fix. Outside of the open-source routing 
solutions (and even for most deployments, including those), I would say this is 
a valid assertion. (It’s more of an assertion than an assumption, IMHO).

> 2. downside of being affected by network device security issue is expensive

This depends on the issue, right?

Owen



Re: BGP Experiment

2019-01-09 Thread Owen DeLong



> On Jan 9, 2019, at 10:37 , Töma Gavrichenkov  wrote:
> 
> On Wed, Jan 9, 2019 at 9:31 PM Owen DeLong  wrote:
>> So if I understand you correctly, your statement is that everyone
>> should be (potentially) rebooting every core, backbone, edge,
>> and other router at least once or twice a week…
> 
> Nope, this is a misunderstanding. One has to *check* for advisories at
> least once or twice a week and only update (and reboot is necessary)
> if there *is* a vulnerability.
> 
> Checking is quite different from, actually, updating. What you may
> want to encourage your competition to do is to deploy a piece of
> software which actually *gets* a severe CVE twice in a week; that will
> certainly bring you a bunch of new customers.

Fair enough, but the frequency of vulnerability announcements even in some of 
the best implementations is still more often than I think my customers will 
tolerated reboots.

At the end of the day, this is really about risk analysis and it helps to put 
things into 1 of 4 risk quadrants based on two axes… Axis 1 is the likelihood 
of the vulnerability being exploited, while axis 2 is the severity of the 
cost/consequences of exploitation.

Obviously something that scores high on both axes will have me rolling out the 
upgrades as rapidly as possible, likely within 24 hours to at least the 
majority of the network.

Something that scores low on both axes, conversely, is likely not worth the 
customer disruption and support call volume (not to mention SLA credits, etc.) 
that come from doing that level of maintenance on short notice (or without 
notice).

The other two quadrants are a grey area that becomes more of a judgment call 
where other factors specific to each operator and their customer profile will 
come into play.

Some operators may have a high tolerance for high-probability low-cost problem, 
while others may find this very urgent, for example.

Owen



Re: BGP Experiment

2019-01-09 Thread Töma Gavrichenkov
On Wed, Jan 9, 2019 at 10:03 PM Saku Ytti  wrote:
> Finding forwarding issues indeed is harder due to the limited access
> to devices, so bit of security through obscurity I guess.

Or, rather, security by complexity. Today's network infrastructure is
complex enough for people to dive into it, looking for all the
underlying issues. Right, it still saves us the day, though today's
Web JS frontend is also quite complex but it's of no help.

--
Töma


Re: BGP Experiment

2019-01-09 Thread Saku Ytti
Hey,

> firmware which only runs on certain expensive devices.  My point is
> that e.g. FRR is an open source software which is designed to run on
> the same Intel-based systems as the one which probably powers your
> laptop.

Most vendors have virtual image for your laptop, all of the modern
routers run Linux and some vendor binary blob, with exception of Nokia
running their own bootingOS (forked off of vxworks ages ago). Finding
control-plane bugs, like BGP UPDATE crash is cheap for hobbyist, you
can download the images off the Internet and run on your laptop.

Finding forwarding issues indeed is harder due to the limited access
to devices, so bit of security through obscurity I guess.

-- 
  ++ytti


Re: BGP Experiment

2019-01-09 Thread Töma Gavrichenkov
On Wed, Jan 9, 2019 at 9:51 PM Saku Ytti  wrote:
> I think this contains some assumptions
>
> 1. discovering security issues in network devices is expensive (and
> thus only those you glean from vendor notices realistically exist)
> 2. downside of being affected by network device security issue is expensive
>
> I'm very skeptical if either are true.

Well, it's significantly harder to look for vulns in closed source
firmware which only runs on certain expensive devices.  My point is
that e.g. FRR is an open source software which is designed to run on
the same Intel-based systems as the one which probably powers your
laptop.

I've received a note from FRR devs stating that they're going to get a
CVE number soon.  It's a good sign, though it should have happened a
bit before roughly a thousand of this mailing list subscribers have
been informed about the issue, but anyway.

--
Töma


Re: BGP Experiment

2019-01-09 Thread Saku Ytti
On Wed, 9 Jan 2019 at 20:45, Töma Gavrichenkov  wrote:

> Nope, this is a misunderstanding. One has to *check* for advisories at
> least once or twice a week and only update (and reboot is necessary)
> if there *is* a vulnerability.

I think this contains some assumptions

1. discovering security issues in network devices is expensive (and
thus only those you glean from vendor notices realistically exist)
2. downside of being affected by network device security issue is expensive

I'm very skeptical if either are true. I think it's very cheap to find
security issues in network devices, particularly DoS issues. And I
don't think downside is expensive, maybe it's bad 4h and lot of angry
customers, but ultimately not that expensive.

I think lot of this is self-organising with delay around rules and
justifications no one understands, and we're not upgrading often,
because it's not (currently) sensible approach.

-- 
  ++ytti


Re: BGP Experiment

2019-01-09 Thread Töma Gavrichenkov
On Wed, Jan 9, 2019 at 9:32 PM Saku Ytti  wrote:
> Those are scheduled, they have to meet some criteria to be pushed on
> scheduled lot. There are also out of cycle SIRTs. And yes, vendors are
> delaying them, because customers don't want to upgrade often, because
> customer's customers don't want to see connections down often.

Yep. The same happened before e.g. to MSFT products and Adobe Flash
for a decade before the former have started to update in days no
matter what, and before the latter was effectively pushed out of most
market niches.

>>  — just like we did with IoT in 2016 —
> Internet still running, I'm still getting paid.

Well, I know a couple of guys who aren't.

> But motivation to simply DoS internet doesn't really
> exist.

Except for hacktivism, fun, gathering a rep within a cracker society,
gathering a rep within one's middle school community, et cetera.  But
anyway,

> DoS is against service end points, infrastucture is trivial
> target, but for some reason not really targeted.

It really is.  ISPs don't get that quite frequently for now, but
end-user network services sometimes do.

> I'm sure state actors have library of DoS transit packets and
> BGP UPDATE packets to be deployed when strategy requires
> given network or region to be
> disrupted.

There's hardly a reason to rely on your next door neighbor's kid not
chatting on the same Darknet forums where those "state actors" get
their data from.  "State actor" thing is highly overrated today.  They
are certainly powerful but hardly more powerful than a skilled team of
anonymous blackhat researchers going in for ransom money.

--
Töma


Re: BGP Experiment

2019-01-09 Thread Töma Gavrichenkov
On Wed, Jan 9, 2019 at 9:31 PM Owen DeLong  wrote:
> So if I understand you correctly, your statement is that everyone
> should be (potentially) rebooting every core, backbone, edge,
> and other router at least once or twice a week…

Nope, this is a misunderstanding. One has to *check* for advisories at
least once or twice a week and only update (and reboot is necessary)
if there *is* a vulnerability.

Checking is quite different from, actually, updating. What you may
want to encourage your competition to do is to deploy a piece of
software which actually *gets* a severe CVE twice in a week; that will
certainly bring you a bunch of new customers.

--
Töma


Re: BGP Experiment

2019-01-09 Thread Saku Ytti
On Wed, 9 Jan 2019 at 20:24, Töma Gavrichenkov  wrote:

> So, network device vendors releasing security advisories twice a year
> isn't a big part of the explanation?

Those are scheduled, they have to meet some criteria to be pushed on
scheduled lot. There are also out of cycle SIRTs. And yes, vendors are
delaying them, because customers don't want to upgrade often, because
customer's customers don't want to see connections down often.

> Err... don't they?  My experience is quite the opposite.

Well that is odd experience, considering anyone with rudimentary
understanding of control-plane policing can bring internet down from
single VPS. Majority of deployed devices _cannot_ be protected against
DoS motivated attacker, and I'm not talking link congestion, I'm
talking control-plane congestion with few Mbps.

> If we could be sure that after such fuzzing there would still be a
> working transport infrastructure to report on top of, then yes.

If it's important to get right, we should try to prove it wrong
actively and persistently by good guys, at least then reporting and
statistics can be produced. But I'm not sure if it's important to get
right, market seems to indicate security does not matter.

>  — just like we did with IoT in 2016 —

Internet still running, I'm still getting paid.

> > If anything, I suspect if it's cheaper to enter the market with
> > inferior security and quality then that is likely good business case
>
> This is also correct so far. I wonder if it's here to stay.

We'd need the current security posture to be sufficiently
unmarketable. But motivation to simply DoS internet doesn't really
exist. DoS is against service end points, infrastucture is trivial
target, but for some reason not really targeted. I'm sure state actors
have library of DoS transit packets and BGP UPDATE packets to be
deployed when strategy requires given network or region to be
disrupted. Because, we, the internet plumbers, keep finding those
without trying, just trying to keep the network working, what can
someone find who is funded and motivated to find those?



-- 
  ++ytti


Re: BGP Experiment

2019-01-09 Thread Owen DeLong


> On Jan 9, 2019, at 09:51 , Töma Gavrichenkov  wrote:
> 
> 9 Jan. 2019 г., 9:56 Randy Bush mailto:ra...@psg.com>>:
> > the question is how soon the frr
> > users out on the internet will upgrade.
> > there are a lot of studies on
> > this.  it sure isn't on the order of a week
> 
> Which is, as usual, a pity, because, generally, synchronizing a piece of 
> software with upstream security updates less frequently than once to twice in 
> a week belongs in Jurassic Park today; and doing it hardly more frequently 
> than once in 6 months, as ISPs usually do, clearly belongs in a bughouse.
> 
> (wonder if this FRR update has got a CVE number though)

So if I understand you correctly, your statement is that everyone should be 
(potentially) rebooting every core, backbone, edge, and other router at least 
once or twice a week…

To quote Randy Bush… I encourage my competitors to try this.

Owen



Re: BGP Experiment

2019-01-09 Thread Töma Gavrichenkov
On Wed, Jan 9, 2019 at 9:07 PM Saku Ytti  wrote:
> Not disputing bug or bog house as ideal location for said policy, just
> want to explain my perspective why it is so.

So, network device vendors releasing security advisories twice a year
isn't a big part of the explanation?

> Hitless upgrades are not really a thing yet, even though they've been
> marketed for 20 years now.

This is correct; on the flip side, hitless vulnerabilities haven't
even been marketed, much less invented.

> Only reason things work as well as they do, is because bad
> guys are not trying to DoS the infrastructure with BGP or
> packet-of-deaths

Err... don't they?  My experience is quite the opposite.

> If this is something we think should be fixed, then we should have
> good guys intentionally fuzzing _public internet_ BGP and
> transit-packet-of-deaths with good reporting.

If we could be sure that after such fuzzing there would still be a
working transport infrastructure to report on top of, then yes.

> if they are abused, Internet will fix those in no more than
> days

 — just like we did with IoT in 2016 —

> and trying to guarantee it cannot happen probably is fools
> errant

> If anything, I suspect if it's cheaper to enter the market with
> inferior security and quality then that is likely good business case

This is also correct so far. I wonder if it's here to stay.

--
Töma


Re: BGP Experiment

2019-01-09 Thread Saku Ytti
On Wed, 9 Jan 2019 at 19:54, Töma Gavrichenkov  wrote:

> Which is, as usual, a pity, because, generally, synchronizing a piece of 
> software with upstream security updates less frequently than once to twice in 
> a week belongs in Jurassic Park today; and doing it hardly more frequently 
> than once in 6 months, as ISPs usually do, clearly belongs in a bughouse.

Not disputing bug or bog house as ideal location for said policy, just
want to explain my perspective why it is so. SPs are making their
reasonable effort to produce product that customers want to buy.
Hitless upgrades are not really a thing yet, even though they've been
marketed for 20 years now. Customers have expectation on how often
their link flaps which is mutually exclusive with rapid upgrade
cycles.

And mostly all this is for show, the code is very broken, all of it.
And the configurations are very broken, all of them. We regularly
break Internet without trying, BGP parsing crashes are like bi-annual
thing. I'm holding, without any motivation or attempt to do so,
transit -packet-of-death for JNPR applicable to ~all JNPR backbones,
and JNPR isn't outlier here. People happily deploy new devices which
cannot be protected against even trivial (<10Mbps) control-plane
attacks. Only reason things work as well as they do, is because bad
guys are not trying to DoS the infrastructure with BGP or
packet-of-deaths, it would be very cheap if someone should be so
motivated.

If this is something we think should be fixed, then we should have
good guys intentionally fuzzing _public internet_ BGP and
transit-packet-of-deaths with good reporting. But likely it doesn't
actually matter at all that the configurations and implementations are
fragile, if they are abused, Internet will fix those in no more than
days, and trying to guarantee it cannot happen probably is fools
errant

If anything, I suspect if it's cheaper to enter the market with
inferior security and quality then that is likely good business case,
internet works so well, consumers are not willing to pay more for
better, but would gladly sacrifice uptime for cheaper price.



-- 
  ++ytti


Re: BGP Experiment

2019-01-09 Thread Töma Gavrichenkov
9 Jan. 2019 г., 9:56 Randy Bush :
> the question is how soon the frr
> users out on the internet will upgrade.
> there are a lot of studies on
> this.  it sure isn't on the order of a week

Which is, as usual, a pity, because, generally, synchronizing a piece of
software with upstream security updates less frequently than once to twice
in a week belongs in Jurassic Park today; and doing it hardly more
frequently than once in 6 months, as ISPs usually do, clearly belongs in a
bughouse.

(wonder if this FRR update has got a CVE number though)


Re: BGP Experiment

2019-01-09 Thread Owen DeLong



> On Jan 8, 2019, at 09:06 , valdis.kletni...@vt.edu wrote:
> 
> On Tue, 08 Jan 2019 17:48:46 +0100, niels=na...@bakker.net said:
> 
>> After seeing this initial result I'm wondering why the researchers 
>> couldn't set up their own sandbox first before breaking code on the 
>> internet.  I believe FRR is a free download and comes with GNU autoconf.
> 
> Perhaps you'd like to supply the researchers (and us) with a *complete*
> list of all BGP-speaking software in use on the Internet? (Personally, I'd
> never heard of FRR before)

+1



Re: BGP Experiment

2019-01-08 Thread Tore Anderson
* Job Snijders

> Given the severity of the bug, there is a strong incentive for people to 
> upgrade ASAP.

The buggy code path can also be disabled without upgrading, by building
FRR with the --disable-bgp-vnc configure option, as I understand it.

I've been told that this is the default in Cumulus Linux.

Tore


Re: BGP Experiment

2019-01-08 Thread Job Snijders
On Wed, Jan 9, 2019 at 9:55 Randy Bush  wrote:

> >>> We plan to resume the experiments January 16th (next Wednesday), and
> >>> have updated the experiment schedule [A] accordingly.  As always, we
> >>> welcome your feedback.
> >> i did not realize that frr updates propagated so quickly.  very cool.
> >
> > FRR is undergoing a fairly rapid pace of development
>
> that is impressive but irrelevant.  the question is how soon the frr
> users out on the internet will upgrade.  there are a lot of studies on
> this.  it sure isn't on the order of a week.



Given the severity of the bug, there is a strong incentive for people to
upgrade ASAP.

Kind regards,

Job


Re: BGP Experiment

2019-01-08 Thread Randy Bush
>>> We plan to resume the experiments January 16th (next Wednesday), and
>>> have updated the experiment schedule [A] accordingly.  As always, we
>>> welcome your feedback.
>> i did not realize that frr updates propagated so quickly.  very cool.
>
> FRR is undergoing a fairly rapid pace of development

that is impressive but irrelevant.  the question is how soon the frr
users out on the internet will upgrade.  there are a lot of studies on
this.  it sure isn't on the order of a week.

randy


Re: BGP Experiment

2019-01-08 Thread Eric Kuhnke
FRR is undergoing a fairly rapid pace of development, thanks to the
cloud-scale operators and hosting providers which are using it in
production.

https://cumulusnetworks.com/blog/welcoming-frrouting-to-the-linux-foundation/

On Tue, Jan 8, 2019 at 11:55 AM Randy Bush  wrote:

> > We plan to resume the experiments January 16th (next Wednesday), and
> > have updated the experiment schedule [A] accordingly.  As always, we
> > welcome your feedback.
>
> i did not realize that frr updates propagated so quickly.  very cool.
>
> randy
>


RE: BGP Experiment

2019-01-08 Thread adamv0025
> Steve Noble
> Sent: Tuesday, January 8, 2019 6:42 PM
> 
> There is no such thing as a fully RFC compliant BGP :
> 
Which RFC do you mean 6286, 6608, 6793, 7606, 7607, 7705 or 8212 when you say 
fully RFC compliant BGP please?

> https://www.juniper.net/documentation/en_US/junos/topics/reference/st
> andards/bgp.html does not list 7606
> 
> Cisco Bug: CSCvf06327 - Error Handling for RFC 7606 not implemented for
> NXOS
> 
> This is as of today and a 2 second google search.. anyone running code from
> before RFC 7606 (2015) would also not be compliant.
> 
With regards to Revised Error Handling for BGP UPDATE Messages RFC 7606,
My recollection is there was a very long discussion with working code preceding 
the various drafts as well as the final RFC standard.
Regarding the Juniper case specifically a bit of googling reveals that:
All Junos software releases built on or after 2009-06-29 have been enhanced to 
be more tolerant of malformed optional, transitive attributes. Releases 
containing the coding change specifically include: 9.1S2, 9.3R3, 9.6R1 and all 
subsequent releases (i.e. all releases built after 9.6R1).
-so it's not quite black and white, there will be levels of protection 
available in current releases (albeit not fully compliant with RFC per se).  
Question is whether folks out there have it actually enabled.
Oh and then there are bugs associated with the new feature (like the one in 
some versions of Junos which ,upon receiving malformed update won't bring the 
session down but rather the whole rpd if the bgp-error-tolerance feature is 
enabled )
 

adam

  



Re: BGP Experiment

2019-01-08 Thread Stephen Satchell
On 1/8/19 9:31 AM, Töma Gavrichenkov wrote:
> 8 Jan. 2019 г., 20:19 :
>> In the real world, doing the correct thing
> 
> — such as writing RFC compliant code —
> 
>> is often harder than doing
>> an incorrect thing, yes.
> 
> Evidently, yes.

I "grew up" during the early days of PPP.  As a member of the press I
attended an "inter-op" session at Telebit's campus, and watched as a
collection of engineers and programmers matched up implementations of
PPP and found bugs in both the Proposed Standard and in the
implementations thereof.

Watching these guys with all sorts of data monitors trying to figure out
who goofed was an interesting and fascinating experience.

During my stint with the Telecommunications Industry Associate TR-30
committee hashing out modem standards like V.32 et al and V.25 ter was a
similar exercise -- one that lead to me being in a near fight in a
parking lot in San Jose with a Microsoft enginner over clarity problems
with the proposed Standard for side-channel protocol.  "Can you do
better?"  "Yes."  "Prove it."  And I did.  My proposal was accepted by
all, even the Microsoft guy.

(We continued to collaborate until he cashed out of the company.)


Re: BGP Experiment

2019-01-08 Thread Randy Bush
> We plan to resume the experiments January 16th (next Wednesday), and
> have updated the experiment schedule [A] accordingly.  As always, we
> welcome your feedback.

i did not realize that frr updates propagated so quickly.  very cool.

randy


Re: BGP Experiment

2019-01-08 Thread Steve Noble

There is no such thing as a fully RFC compliant BGP :

https://www.juniper.net/documentation/en_US/junos/topics/reference/standards/bgp.html 
does not list 7606


Cisco Bug: CSCvf06327 - Error Handling for RFC 7606 not implemented for NXOS

This is as of today and a 2 second google search.. anyone running code 
from before RFC 7606 (2015) would also not be compliant.


I did not see Juniper on the list of BGP speakers tested.

Töma Gavrichenkov wrote on 1/8/19 9:31 AM:

8 Jan. 2019 г., 20:19 mailto:na...@bakker.net>>:
> In the real world, doing the correct thing

— such as writing RFC compliant code —

> is often harder than doing
> an incorrect thing, yes.

Evidently, yes.





Re: BGP Experiment

2019-01-08 Thread Töma Gavrichenkov
8 Jan. 2019 г., 20:19 :
> In the real world, doing the correct thing

— such as writing RFC compliant code —

> is often harder than doing
> an incorrect thing, yes.

Evidently, yes.

>


Re: BGP Experiment

2019-01-08 Thread Jared Mauch



> On Jan 8, 2019, at 12:10 PM, niels=na...@bakker.net wrote:
> 
> * thomasam...@gmail.com (Tom Ammon) [Tue 08 Jan 2019, 17:59 CET]:
>> There are a fair number of open source BGP implementations now. It would 
>> require additional effort to test all of them.
> 
> In the real world, doing the correct thing is often harder than doing an 
> incorrect thing, yes.
> 

And other times you just get BGP as art

https://twitter.com/powerdns_bert/status/878291436034170881

- jared

Re: BGP Experiment

2019-01-08 Thread niels=nanog

Hi Saku,

After seeing this initial result I'm wondering why the researchers 
couldn't set up their own sandbox first before breaking code on the 
internet.  I believe FRR is a free download and comes with GNU autoconf.


We probably should avoid anything which might demotivate future good
guys from finding breaking bugs and reporting them, while sending
perfectly standard-compliant messages. Only ones who will win are bad
guys who collect libraries of how-to-break-internet.
There are certainly several transit packet of deaths and BGP parser
bugs in each implementation, I'd rather have good guy trigger them and
give me details why my network broke, than have bad guy store them for
future use.


I fully agree with you.  However, this doesn't give 'good guys' carte 
blanche to break stuff.  I'm glad they've already taken action to 
improve their practices as confirmed by Italo Cunha in his earlier mail.



-- Niels.


Re: BGP Experiment

2019-01-08 Thread Jared Mauch



> On Jan 8, 2019, at 12:06 PM, valdis.kletni...@vt.edu wrote:
> 
> On Tue, 08 Jan 2019 17:48:46 +0100, niels=na...@bakker.net said:
> 
>> After seeing this initial result I'm wondering why the researchers 
>> couldn't set up their own sandbox first before breaking code on the 
>> internet.  I believe FRR is a free download and comes with GNU autoconf.
> 
> Perhaps you'd like to supply the researchers (and us) with a *complete*
> list of all BGP-speaking software in use on the Internet? (Personally, I'd
> never heard of FRR before)

Yeah, I think it also gets complicated as some of us also have our own internal 
BGP speakers as well.  Taking MRT files from route-views or RIPE RIS and 
replaying them is certainly helpful to simulate certain events.  I’ve found a 
lot of interesting “new attribute” experiments when I had a poorly written MRT 
parser that would trigger periodically when something new hit the internet.

(FRR is descendent of Zebra/Quagga world)

- Jared

Re: BGP Experiment

2019-01-08 Thread Job Snijders
OOn Tue, Jan 8, 2019 at 19:59 Tom Ammon  wrote:

> On Tue, Jan 8, 2019, 11:50 AM 
>> * cu...@dcc.ufmg.br (Italo Cunha) [Tue 08 Jan 2019, 17:42 CET]:
>> >[A] https://goo.gl/nJhmx1
>>
>> For the archives, since goo.gl will cease to exist soon, this links to
>>
>> https://docs.google.com/spreadsheets/d/1U42-HCi3RzXkqVxd8e2yLdK9okFZl77tWZv13EsEzO0/htmlview
>>
>> After seeing this initial result I'm wondering why the researchers
>> couldn't set up their own sandbox first before breaking code on the
>> internet.  I believe FRR is a free download and comes with GNU autoconf.
>>
>
> There are a fair number of open source BGP implementations now. It would
> require additional effort to test all of them.
>


Not just every implementation, but also every version, and every
configuration permutation. This type of black box testing is not scalable.
It is not feasible work, nor the job of these researchers. It’s the job of
the software the developer to ensure the product is standards compliant.

In the case of FRR:

- improper use of the 0xFF codepoint
- FRR is not compliant with RFC 7606 (the devs indicated they will be
working on this)

Ultimately, the developers are responsible for their product, not random
other internet users. This situation was avoidable if standards had been
followed.

I’m happy the FRR developers quickly identified the issue and published a
fix. We can now all move on.

Kind regards,

Job

>


Re: BGP Experiment

2019-01-08 Thread niels=nanog

* valdis.kletni...@vt.edu (valdis.kletni...@vt.edu) [Tue 08 Jan 2019, 18:06 
CET]:

(Personally, I'd never heard of FRR before)


Martin Winter of OSR/FRR has attended many a NANOG, RIPE and other 
industry meetings, so it's not for their lack of trying



-- Niels.


Re: BGP Experiment

2019-01-08 Thread niels=nanog

* thomasam...@gmail.com (Tom Ammon) [Tue 08 Jan 2019, 17:59 CET]:
There are a fair number of open source BGP implementations now. It 
would require additional effort to test all of them.


In the real world, doing the correct thing is often harder than doing 
an incorrect thing, yes.



-- Niels.


Re: BGP Experiment

2019-01-08 Thread Nick Hilliard

niels=na...@bakker.net wrote on 08/01/2019 16:48:
After seeing this initial result I'm wondering why the researchers 
couldn't set up their own sandbox first before breaking code on the 
internet.  I believe FRR is a free download and comes with GNU autoconf.


the researchers didn't break code - their test unearthed broken code.

That code has now been fixed, so this is a good result.

Nick


Re: BGP Experiment

2019-01-08 Thread valdis . kletnieks
On Tue, 08 Jan 2019 17:48:46 +0100, niels=na...@bakker.net said:

> After seeing this initial result I'm wondering why the researchers 
> couldn't set up their own sandbox first before breaking code on the 
> internet.  I believe FRR is a free download and comes with GNU autoconf.

Perhaps you'd like to supply the researchers (and us) with a *complete*
list of all BGP-speaking software in use on the Internet? (Personally, I'd
never heard of FRR before)


Re: BGP Experiment

2019-01-08 Thread Saku Ytti
Hey,

> After seeing this initial result I'm wondering why the researchers
> couldn't set up their own sandbox first before breaking code on the
> internet.  I believe FRR is a free download and comes with GNU autoconf.

We probably should avoid anything which might demotivate future good
guys from finding breaking bugs and reporting them, while sending
perfectly standard-compliant messages. Only ones who will win are bad
guys who collect libraries of how-to-break-internet.
There are certainly several transit packet of deaths and BGP parser
bugs in each implementation, I'd rather have good guy trigger them and
give me details why my network broke, than have bad guy store them for
future use.

-- 
  ++ytti


Re: BGP Experiment

2019-01-08 Thread Italo Cunha
Hi Niels, we did run the experiment in a controlled environment with
different versions of Cisco, BIRD, and Quagga routers and observed no
issues. We did add FRR to the test suite yesterday for future tests.


On Tue, Jan 8, 2019 at 11:49 AM  wrote:
>
> * cu...@dcc.ufmg.br (Italo Cunha) [Tue 08 Jan 2019, 17:42 CET]:
> >[A] https://goo.gl/nJhmx1
>
> For the archives, since goo.gl will cease to exist soon, this links to
> https://docs.google.com/spreadsheets/d/1U42-HCi3RzXkqVxd8e2yLdK9okFZl77tWZv13EsEzO0/htmlview
>
> After seeing this initial result I'm wondering why the researchers
> couldn't set up their own sandbox first before breaking code on the
> internet.  I believe FRR is a free download and comes with GNU autoconf.
>
>
> -- Niels.


Re: BGP Experiment

2019-01-08 Thread Tom Ammon
On Tue, Jan 8, 2019, 11:50 AM  * cu...@dcc.ufmg.br (Italo Cunha) [Tue 08 Jan 2019, 17:42 CET]:
> >[A] https://goo.gl/nJhmx1
>
> For the archives, since goo.gl will cease to exist soon, this links to
>
> https://docs.google.com/spreadsheets/d/1U42-HCi3RzXkqVxd8e2yLdK9okFZl77tWZv13EsEzO0/htmlview
>
> After seeing this initial result I'm wondering why the researchers
> couldn't set up their own sandbox first before breaking code on the
> internet.  I believe FRR is a free download and comes with GNU autoconf.
>
>
> -- Niels.
>

There are a fair number of open source BGP implementations now. It would
require additional effort to test all of them.

Tom

>


Re: BGP Experiment

2019-01-08 Thread niels=nanog

* cu...@dcc.ufmg.br (Italo Cunha) [Tue 08 Jan 2019, 17:42 CET]:

[A] https://goo.gl/nJhmx1


For the archives, since goo.gl will cease to exist soon, this links to
https://docs.google.com/spreadsheets/d/1U42-HCi3RzXkqVxd8e2yLdK9okFZl77tWZv13EsEzO0/htmlview

After seeing this initial result I'm wondering why the researchers 
couldn't set up their own sandbox first before breaking code on the 
internet.  I believe FRR is a free download and comes with GNU autoconf.



-- Niels.


Re: BGP Experiment

2019-01-08 Thread Italo Cunha
NANOG,

We've performed the first announcement in this experiment yesterday,
and, despite the announcement being compliant with BGP standards, FRR
routers reset their sessions upon receiving it.  Upon notice of the
problem, we halted the experiments.  The FRR developers confirmed that
this issue is specific to an unintended consequence of how FRR handles
the attribute 0xFF (reserved for development) we used.  The FRR devs
already merged a fix and notified users.

We plan to resume the experiments January 16th (next Wednesday), and
have updated the experiment schedule [A] accordingly.  As always, we
welcome your feedback.

[A] https://goo.gl/nJhmx1

On Tue, Dec 18, 2018 at 10:05 AM Italo Cunha  wrote:
>
> NANOG,
>
> We would like to inform you of an experiment to evaluate alternatives
> for speeding up adoption of BGP route origin validation (research
> paper with details [A]).
>
> Our plan is to announce prefix 184.164.224.0/24 with a valid
> standards-compliant unassigned BGP attribute from routers operated by
> the PEERING testbed [B, C]. The attribute will have flags 0xe0
> (optional transitive [rfc4271, S4.3]), type 0xff (reserved for
> development), and size 0x20 (256bits).
>
> Our collaborators recently ran an equivalent experiment with no
> complaints or known issues [A], and so we do not anticipate any
> arising. Back in 2010, an experiment using unassigned attributes by
> RIPE and Duke University caused disruption in Internet routing due to
> a bug in Cisco routers [D, CVE-2010-3035]. Since then, this and other
> similar bugs have been patched [e.g., CVE-2013-6051], and new BGP
> attributes have been assigned (BGPsec-path) and adopted (large
> communities). We have successfully tested propagation of the
> announcements on Cisco IOS-based routers running versions 12.2(33)SRA
> and 15.3(1)S, Quagga 0.99.23.1 and 1.1.1, as well as BIRD 1.4.5 and
> 1.6.3.
>
> We plan to announce 184.164.224.0/24 from 8 PEERING locations for a
> predefined period of 15 minutes starting 14:30 GMT, from Monday to
> Thursday, between the 7th and 22nd of January, 2019 (full schedule and
> locations [E]). We will stop the experiment immediately in case any
> issues arise.
>
> Although we do not expect the experiment to cause disruption, we
> welcome feedback on its safety and especially on how to make it safer.
> We can be reached at disco-experim...@googlegroups.com.
>
> Amir Herzberg, University of Connecticut
> Ethan Katz-Bassett, Columbia University
> Haya Shulman, Fraunhofer SIT
> Ítalo Cunha, Universidade Federal de Minas Gerais
> Michael Schapira, Hebrew University of Jerusalem
> Tomas Hlavacek, Fraunhofer SIT
> Yossi Gilad, MIT
>
> [A] https://conferences.sigcomm.org/hotnets/2018/program.html
> [B] http://peering.usc.edu
> [C] https://goo.gl/AFR1Cn
> [D] 
> https://labs.ripe.net/Members/erik/ripe-ncc-and-duke-university-bgp-experiment
> [E] https://goo.gl/nJhmx1


Re: BGP Experiment

2018-12-20 Thread Job Snijders
Dear Italo,

Thanks for giving the community a heads-up on your plan! I think your
announcement like these are the best anyone can do when trying legal
but new BGP path attributes.

I'll forward this message to other NOGs and make sure that our NOC
adds it to their calendar.

Kind regards,

Job

On Thu, Dec 20, 2018 at 6:39 PM Italo Cunha  wrote:
>
> NANOG,
>
> We would like to inform you of an experiment to evaluate alternatives
> for speeding up adoption of BGP route origin validation (research
> paper with details [A]).
>
> Our plan is to announce prefix 184.164.224.0/24 with a valid
> standards-compliant unassigned BGP attribute from routers operated by
> the PEERING testbed [B, C]. The attribute will have flags 0xe0
> (optional transitive [rfc4271, S4.3]), type 0xff (reserved for
> development), and size 0x20 (256bits).
>
> Our collaborators recently ran an equivalent experiment with no
> complaints or known issues [A], and so we do not anticipate any
> arising. Back in 2010, an experiment using unassigned attributes by
> RIPE and Duke University caused disruption in Internet routing due to
> a bug in Cisco routers [D, CVE-2010-3035]. Since then, this and other
> similar bugs have been patched [e.g., CVE-2013-6051], and new BGP
> attributes have been assigned (BGPsec-path) and adopted (large
> communities). We have successfully tested propagation of the
> announcements on Cisco IOS-based routers running versions 12.2(33)SRA
> and 15.3(1)S, Quagga 0.99.23.1 and 1.1.1, as well as BIRD 1.4.5 and
> 1.6.3.
>
> We plan to announce 184.164.224.0/24 from 8 PEERING locations for a
> predefined period of 15 minutes starting 14:30 GMT, from Monday to
> Thursday, between the 7th and 22nd of January, 2019 (full schedule and
> locations [E]). We will stop the experiment immediately in case any
> issues arise.
>
> Although we do not expect the experiment to cause disruption, we
> welcome feedback on its safety and especially on how to make it safer.
> We can be reached at disco-experim...@googlegroups.com.
>
> Amir Herzberg, University of Connecticut
> Ethan Katz-Bassett, Columbia University
> Haya Shulman, Fraunhofer SIT
> Ítalo Cunha, Universidade Federal de Minas Gerais
> Michael Schapira, Hebrew University of Jerusalem
> Tomas Hlavacek, Fraunhofer SIT
> Yossi Gilad, MIT
>
> [A] https://conferences.sigcomm.org/hotnets/2018/program.html
> [B] http://peering.usc.edu
> [C] https://goo.gl/AFR1Cn
> [D] 
> https://labs.ripe.net/Members/erik/ripe-ncc-and-duke-university-bgp-experiment
> [E] https://goo.gl/nJhmx1