from:"Job Snijders via routing\-wg"

Re: [routing-wg] RIPE NCC RPKI Routing Update May 2024

2024-05-20 Thread Job Snijders via routing-wg

Dear RIPE NCC RPKI team,

< Speaking with no hats >

Thank you for sharing current status, plans, and reasoning behind
prioritization choices. I'm excited to see RIPE NCC's 'hosted' RPKI
service offering starting to move beyond "just ROAs".

Kind regards,

Job

On Mon, May 20, 2024 at 10:35:51AM +0200, Tim Bruijnzeels wrote:
> Dear colleagues,
> 
> Due to a full agenda, there will be no RIPE NCC RPKI update during the 
> Routing Working Group session at RIPE 88. Instead, I will give a short 
> high-level update during the RIPE NCC Services Working Group session. I would 
> also like to share a more detailed update on our work and changes since RIPE 
> 87 and our plans for the coming months.
> 
> If you have any questions or comments, or if you want to express your ideas 
> on priorities, then please don't hesitate to talk to me in person at the RIPE 
> Meeting, join the RIPE NCC Services Working Group session or discuss on this 
> mailing list.
> 
> RPKI Compliance Project (ISAE3000)
> =
> 
> The RIPE NCC is currently conducting an ISAE3000/SOC 2 audit for the RPKI 
> service. The SOC 2 type I audit is in its final stages. We are planning to 
> continue to work on a (recurring) SOC 2 type II audit in the years to come. 
> If you want to know more about this project, then I recommend you watch the 
> Technology Update that Felipe Victoria Silveira will give during the RIPE NCC 
> Services WG session at RIPE 88.
> 
> Supporting the audit has taken significant resources from the development 
> team, but on the positive side, this has forced us to critically think about 
> all aspects of the RPKI service: software, infrastructure, processes and the 
> supporting organisation. As a result, we have made small but significant 
> improvements such as providing more human-friendly insight into the Trust 
> Anchor signing process, improving the database point-in-time recovery options 
> (using write-ahead-logs), formally capturing existing engineering knowledge 
> in a business continuity plan, and updating the Certification Practice 
> Statement for transparency.
> 
> ASPA Support in the Pilot
> 
> 
> ASPA has been supported in the RIPE NCC RPKI Pilot (‘localcert’) environment 
> since November 2022. We updated the implementation to use the latest ASPA 
> profile in January 2024. More information on using the API can be found in 
> this email sent to the IETF Sidrops mailing list:
> https://mailarchive.ietf.org/arch/msg/sidrops/K_d8S0ZDXnK0-vXD33uyHc6RnkE/
> 
> For those unfamiliar with ASPA, the very short summary of it is that it 
> allows AS holders to declare who their provider ASNs are - in a BGP path 
> sense, not necessarily in a business relation sense. This can be used to 
> detect route leaks and to some degree (mainly dependent on uptake) path 
> spoofing attacks against ROV.
> 
> For a longer explanation, I would like to refer you to my talk at SEE 12, 
> where I try to explain ASPA using examples [1]. For a longer, and more 
> precise talk, I can recommend a presentation given by Ben Maddison at AfPIF 
> 2023 [2]. For a more fundamental understanding, you can read up on the formal 
> specification of the verification [3] and proof of correctness [4]. Of 
> course, there is more information available. So, if anyone has other useful 
> pointers here, please don’t hesitate to mention them.
> 
> RPKI Dashboard Improvements
> 
> 
> As mentioned in the quarterly planning, we have been working on the RPKI 
> dashboard to improve its usability and make it possible to extend its 
> functionality with new object types. We have performed a user study of the 
> existing dashboard and have started the implementation of the new dashboard.
> 
> We have been making good progress on this project and we expect to be able to 
> start public beta testing in about six weeks from now. The primary goal of 
> this beta testing is to ensure that the new dashboard works intuitively for 
> the users of the hosted RPKI service, before switching over to the current 
> dashboard. Of course, we also do our own testing, but input from real users 
> is invaluable. If you are interested in participating in beta testing, then 
> please let me know and I will make sure that we get in touch with you. 
> 
> Future Work
> ==
> 
> Below, I will give an overview of several work items that we can pick up 
> after the current work on the RPKI dashboard is done. We have ideas about 
> what priorities to give to each item, but I would like to take this 
> opportunity to ask the members of this WG to speak up and share what 
> priorities they would give to each item.
> 
> - ASPA
> 
> Unfortunately, the discussion on the profile and RPKI to router payload has 
> not yet been completely settled in the IETF Sidrops WG. That said, there is 
> overwhelming support in the same WG for getting ASPA ready for deployment. 
> Furthermore, early adopters are testing ASPA

[routing-wg] RPKI ROA deployment now at 50%

2024-05-01 Thread Job Snijders via routing-wg

Dear all,

Fun news! RPKI ROAs now cover 50% of the global Internet’s routing table.
We estimate 70% of traffic is send towards ROV-valid destinations.

An analysis on this milestone and propagation of invalid routes:
https://www.kentik.com/blog/rpki-rov-deployment-reaches-major-milestone/

Kind regards,

Job
-- 

To unsubscribe from this mailing list, get a password reminder, or change your 
subscription options, please visit: 
https://lists.ripe.net/mailman/listinfo/routing-wg

[routing-wg] Call for presentation Routing WG at RIPE86 (Rotterdam, Netherlands)

2023-03-17 Thread Job Snijders via routing-wg

Dear RIPE Routing WG,

This is a call for presentation proposals for RIPE 86! The RIPE 86
meeting takes place in about 66 days: https://ripe86.ripe.net/

We ask both the Working Group and RIPE NCC for presentation proposals
for the eminent 1.5 hour Routing WG slot on Wednesday, May 24th, 2023.

The Routing Working Group is concerned with all aspects of IP routing
technologies. This includes dissemination and discussion of issues
affecting operators, new technologies and new applications of current
technologies, and discussion of concerns relevant to inter and intra-AS
routing. A non-exhaustive list of topics of interest includes BGP,
Routing Security, RPKI (ROV, ASPA, BGPsec), LSR (OSPF/ISIS), Anycast,
and BGP Monitoring (MRT, BMP).

When you submit a proposal please also include slides for the chairs to
review. If you've presented similar material elsewhere please share with
us when and where. Proposals can be sent to routing-wg-cha...@ripe.net

We look forward to seeing you all in Rotterdam!

Kind regards,

Job, Ignas, Paul
RIPE Routing Working Group Co-Chairs

-- 

To unsubscribe from this mailing list, get a password reminder, or change your 
subscription options, please visit: 
https://lists.ripe.net/mailman/listinfo/routing-wg

[routing-wg] RPKI's 2022 Year in Review: growth & innovation

2022-12-31 Thread Job Snijders via routing-wg

Dear all,

With 2023 at our doorstep, I'd like to share some perspective on how
RPKI evolved in the year 2022.

Impact on the Global Internet Routing System


Decision makers might wonder: is investing time and resources worth it?
What is the effectiveness of RPKI Route Origin Validation (RPKI-ROV)?
In the last year a number of interesting reports were published.

Even though less than half of BGP routes is covered by RPKI ROAs [6],
based on flow data, Kentik estimates [2] nowadays the majority of IP
traffic is destined towards RPKI-valid BGP routes. Their follow-up
report [3] (analysing BGP control-plane data) suggests that evaluation
of a BGP route as RPKI-invalid reduces its propagation by anywhere
between one half to two thirds. Cloudflare [4] published a report
analysing data-plane connectivity between a select number of ASes and
RPKI-invalid destinations: they estimate 6.5% (lower-bound) of
residential Internet users enjoy the benefits their ISP doing RPKI-ROV.
Another experiment report [5] (focussed on data-plane connectivity
between validators and RPKI-valid/RPKI-invalid destinations), concluded
the existence of RPKI ROAs helped move 75% of test traffic towards the
correct destination.

The above metrics might appear all over the place (6.5% up to 75%), but
keep in mind these analyses are not mutually exclusive. Observations of
the Internet's topology are a function of the observer's vantage point.

All the referenced reports agree on key points:

  * ROAs have a measurable & significant impact on global IP traffic delivery
  * RPKI-ROV helps reduce the "blast radius" of BGP routing incidents
  * They recommend to continue the global deployment of RPKI-ROV
(rejecting RPKI-invalid BGP routes), and create ROAs for all IP
address space.

Year to Year Growth of the distributed RPKI database


In comparison to "effectiveness", the bare existence, size, contents,
and number of Signed Objects in the globally distributed RPKI repository
system is much easier to quantify.

The below table was constructed by comparing two December 31st
RPKIviews.org snapshots [1] of validated RPKI caches, primed with the
ARIN, AFRINIC, APNIC, LACNIC, and RIPE Trust Anchors.

   2021-12-31 2022-12-31
Total cache size (KiB):   996,216  1,240,572  (+24%)
Total number of files (objects):  192,503242,969  (+26%)
Publication servers (FQDNs):   36 52  (+44%)
Certification authorities: 28,328 34,901  (+23%)
Route origin authorizations:  101,645138,323  (+36%)
Unique VRPs:  302,025390,752  (+29%)
IPv4 addresses covered: 1,139,561,719  1,354,270,410  (+19%)
IPv6 addresses covered: 7,499,405,083  9,446,853,925  (+26%) *10^24
Unique origin ASNs in ROAs:27,174 34,455  (+27%)

A healthy growth rate across the board!

With the ubiquitous availability of "Publication as a Service" hosted by
RIRs, I expect (and hope!) the growth of the number of distinct
publication servers to stall, or even drop in 2023.

The number of Certification Authorities (CAs) closely corresponds to the
number of RIR members (RIR customers) who opted to enable RPKI services
for their Internet Number Resources, making it a useful proxy metric to
understand how many organisations are creating RPKI ROAs.

A single Route origin authorizations (ROA) can contain one or more
Validated ROA Payloads (VRPs), and one or multiple ROAs can contain the
exact same VRP information. "Unique" in the above table indicates the
metric's underlaying data was deduplicated.

Each ROA can only contain a single Origin ASN. Multiple ROAs can refer
to the same Origin ASN value.

Innovation through Standardisation
==

The IETF SIDROPS [7] working group (the designated forum in which
volunteers collaborate to define and specify open standards for RPKI and
RPKI-based technologies) was fairly productive in 2022 and managed to
publish 5 RFCs:

RFC 9286 - Manifests for the RPKI   (revision)
RFC 9255 - The 'I' in RPKI Does Not Stand for Identity (clarification)
RFC 9319 - The Use of maxLength in the RPKI(clarification)
RFC 9323 - A Profile for RPKI Signed Checklists (RSCs)(innovation)
RFC 9324 - Policy Based on the RPKI without Route Refresh (innovation)

The above body of work consists mostly of revisions of older work or
clarifications on how to use the RPKI, to me this demonstrates a
somewhat conservative approach (rather than innovation at breakneck
speed), which I consider a good thing.

Outlook & Conclusion


Now that globally Route Origin Validation has advanced as far as it has,
the next obvious target is BGP path validation, to mitigate two distinct
problems: BGP route leaks and BGP AS_PATH spoofing. Both painful to
network

Re: [routing-wg] Proposed Service Criticality Form

2022-12-23 Thread Job Snijders via routing-wg

Dear RIPE NCC,

Thanks for offering the opportunity share feedback. I'd like to comment
in individual capacity on the porposal Service Criticality ratings.

Summary: I consider the proposed criticality ratings appropriate for the
 RPKI service.

Elaboration:

1/ Confidentiality
Comment: The purpose of RPKI keys is signing and signature chain
 verification, a purpose very different from encryption
 (encryption being a prerequisite for to achieve
 confidentiality).

2/ Integrity
Comment: Unauthorized access to private key materials or to interfaces
 which trigger operations with private key material (such as the
 LIR portal) poses a high risk to operators. It is easy to
 imagine a multitude of scenarios where a compromise of
 integrity leads to high severity incidents.

3/ Availability
Comment: To ability to issue, revoke, and verify RPKI certificates is a
 24/7 operational aspect where any downtime stalls business
 processes. For example, being unable to rectify a misconfigured
 ROA can have very adverse impact on business. A high severity
 rating seems proper.

I'm happy to elaborate on any of the above comments if the comments
raise questions in people!

Kind regards,

Job

On Thu, Dec 15, 2022 at 04:17:48PM +0100, Nathalie Trenaman wrote:
> Dear Colleagues,
> 
> During RIPE 84 in May, we discussed the Service Criticality Framework. We 
> mentioned that we would like the Routing Working Group's input on the 
> criticality rating for the RPKI service.
> 
> Thank you to the Co-chairs for asking the working group for input in June 
> with the original form: 
> https://www.ripe.net/ripe/mail/archives/routing-wg/2022-June/004582.html
> 
> Since we did not receive much feedback so far, we hereby share our proposed 
> completed form with you (see attached PDF). We hope to receive your feedback 
> by 23 December. 
> 
> Kind regards
> Nathalie Trenaman
> RIPE NCC
> 

> -- 
> 
> To unsubscribe from this mailing list, get a password reminder, or change 
> your subscription options, please visit: 
> https://lists.ripe.net/mailman/listinfo/routing-wg

-- 

To unsubscribe from this mailing list, get a password reminder, or change your 
subscription options, please visit: 
https://lists.ripe.net/mailman/listinfo/routing-wg

Re: [routing-wg] Proposed Service Criticality Form

2022-12-23 Thread Job Snijders via routing-wg

Dear all,

[[ ... If you are looking for a fun way to spend some time on your
 second-last Friday of the year ... please read on! ... :-) ... ]]

The working group is encouraged to consider commenting on the Service
Criticality Framework proposal for RPKI. Understanding the community's
wishes and expectations around RPKI Service Delivery will help greatly
help RIPE NCC improve RPKI services.

Kind regards,

Job
Routing-WG Co-chair

On Thu, Dec 15, 2022 at 04:17:48PM +0100, Nathalie Trenaman wrote:
> Dear Colleagues,
> 
> During RIPE 84 in May, we discussed the Service Criticality Framework. We 
> mentioned that we would like the Routing Working Group's input on the 
> criticality rating for the RPKI service.
> 
> Thank you to the Co-chairs for asking the working group for input in June 
> with the original form: 
> https://www.ripe.net/ripe/mail/archives/routing-wg/2022-June/004582.html
> 
> Since we did not receive much feedback so far, we hereby share our proposed 
> completed form with you (see attached PDF). We hope to receive your feedback 
> by 23 December. 
> 
> Kind regards
> Nathalie Trenaman
> RIPE NCC

-- 

To unsubscribe from this mailing list, get a password reminder, or change your 
subscription options, please visit: 
https://lists.ripe.net/mailman/listinfo/routing-wg

Re: [routing-wg] RPKI ROAs and Monitoring

2022-12-12 Thread Job Snijders via routing-wg

Hi Klaus!

On Mon, Dec 12, 2022 at 12:12:03PM +0100, Klaus Darilion via routing-wg wrote:
> Until now we have not used RPKI. For us at nic.at and RcodeZero DNS we
> are not on the validating side of RPKI, but we would only create ROAs,
> using the RIPE service. I could just login to the RIPE portal and in 5
> minutes it is done. But I am a bit concerned about activating the
> service and do not care anymore. Hence I think we should have some
> monitoring too.

Monitoring your ROAs is a really good idea! I recommend taking a look at
this presentation https://www.youtube.com/watch?v=cJUkOu9nWT8

> We have a defined target state, eg. prefix 83.136.32.0/21 should be
> announced from AS30971. So I think our monitoring should check:
> 
> -  is there a ROA for 83.136.32.0/21 from AS30971
> -  is the ROA valid, ie. not expired
> -  Will validating ISPs accept these prefixes? Will validating
> ISPs reject this prefix if the orign AS is wrong (maybe having a local
> Routinator or queriying a public service via API).

Indeed, validating ISPs will reject the BGP announcement if the Origin
AS is incorrectly configured in the ROA. Make sure to not make any
typos when creating ROAs! :-)

Here is a blog post that details what the impact is of misconfigured
ROAs (and conversely - what the positive impact is of correctly
configured ROAs!)
https://www.kentik.com/blog/how-much-does-rpki-rov-reduce-the-propagation-of-invalid-routes/

> Do you think this makes sense? Is such monitoring already available
> and I only have to subcribe somewhere (free or comemrcial)? Do I miss
> something? Any hints what I should do before and after creating the
> ROAs?

One dataset to check for RPKI objects related to your prefixes is
https://console.rpki-client.org/dump.json.gz (for all details)
or https://console.rpki-client.org/vrps.json (for condensed version)

> PS: What happens if my ROAs expire. Will then my BGP announcements be
> ignored by validating ISPs or will it just be as if there are no ROAs
> at all?

Indeed, then it will be like there are no ROAs at all.

Kind regards,

Job

-- 

To unsubscribe from this mailing list, get a password reminder, or change your 
subscription options, please visit: 
https://lists.ripe.net/mailman/listinfo/routing-wg

Re: [routing-wg] Call for presentation Routing WG at RIPE85 (Belgrade, Serbia)

2022-09-24 Thread Job Snijders via routing-wg

Dear RIPE Routing WG,

This is a repeat of the call for presentation proposals for RIPE 85.

The RIPE 85 meeting takes place in about 32 days: https://ripe85.ripe.net/

We ask the Working Group and RIPE NCC for presentation proposals for the
illustrious 1.5 hour Routing WG slot on Wednesday, October 26th, 2022.

When you submit a proposal, please also include slides for the chairs to
review! :-) Proposals can be sent to routing-wg-cha...@ripe.net

Kind regards,

Job, Ignas, Paul
Routing Working Group Chairs

-- 

To unsubscribe from this mailing list, get a password reminder, or change your 
subscription options, please visit: 
https://lists.ripe.net/mailman/listinfo/routing-wg

[routing-wg] Call for presentation Routing WG at RIPE85 (Belgrade, Serbia)

2022-08-23 Thread Job Snijders via routing-wg

Dear RIPE Routing WG,

This is a call for presentation proposals for RIPE 85.

The RIPE 85 meeting takes place in about 64 days: https://ripe85.ripe.net/

We ask the Working Group and RIPE NCC for presentation proposals for the
illustrious 1.5 hour Routing WG slot on Wednesday, October 26th, 2022.

When you submit a proposal please also include slides for the chairs to
review. If you've presented similar material elsewhere please share with
us when and where. Proposals can be sent to routing-wg-cha...@ripe.net

Kind regards,

Job, Ignas, Paul (your humble Routing Working Group Chairs)

-- 

To unsubscribe from this mailing list, get a password reminder, or change your 
subscription options, please visit: 
https://lists.ripe.net/mailman/listinfo/routing-wg

[routing-wg] Frequently Asked Questions about 2000::/12 and related routing errors

2022-07-07 Thread Job Snijders via routing-wg

Dear all,

Last night many people received "Resource Certification (RPKI)" alerts,
which in turn caused my phone to light up with questions! :-) In the
below message I'll attempt to provide an analysis of what happend and
answer frequently asked questions.

* What happened?
* Has this happened before?
* Why didn't RPKI Route Origin Validation (ROV) stop this?

What happened?
==

As reported in the media 
(https://twitter.com/DougMadory/status/1544862409336184832) 
one Internet Service Provider announced to the world - through the BGP
protocol - that all Internet Protocol addresses contained within
2000::/12 were reachable via them. This was a routing error, an error
condition which triggered various monitoring systems around the globe.

Background: The BGP Default-Free Zone is composed of ~ 150,000 IPv6
networks originated from ~ 24,000 Autonomous Systems (ASes). The
totality of this is what forms the IPv6 Internet. The majority of these
networks have a prefix length in the range of /32 up to /48. Currently
the world's largest IPv6 assignments (of which there are very few) are
clocking in at /19. So, a /12 ("slash twelve") BGP announcement covers
an exceptionally large number of IP addresses!

This night's /12 BGP announcement covered such a large block of address
space, it happened to overlap with about 21,292 existing networks
originated by 3,697 ASes. For roughly 69% (14,695) of those networks
RPKI ROAs had been created. About 10% (2,176) of those "RPKI ROA covered
existing networks" is IPv6 space managed under the RIPE NCC umbrella.

I imagine a few hundred operators received alerts from RIPE NCC with a
suggestion to considering creating corresponding ROAs to make the
2000::/12 announcement valid; however no ISP can create such a ROA,
because no single ISP is authoritative for the entirety of that block. :)

Has this happened before?
=

Yes. This type of routing error happens almost annually. Some time ago
Tom Strickx reported an incident involving 2400::/12, a block which
nowadays overlaps with more than 40,000 networks! (source:
https://twitter.com/Jerome_UZ/status/1145136294835523584)
If my memory serves me right, back in 2016 AS 1299 originated both
2000::/6 and 2000::/12, later that year AS 10026 also originated
2000::/12 for a bit.

So... how exactly can this happen? 

I believe it is a mixture of user-interfaces with really sharp edges and
permissive EBGP filters.

Many router-to-router linknets are assigned a /127 [RFC 6164] or a /64
[RFC 7421], and loopback addresses generally are assigned a /128 (a
single address).

It's not hard to imagine that when copy+pasting or typing by hand, an
operator fails to input the last digit (respectively a 7 in the case of
/127, the 4 in /64, or the 8 in /128), resulting in a configuration with
a /12 or a /6 as the prefix length.

See these Cisco & Juniper terminal transcript examples for a
demonstration of failing to correctly enter the last digit of
"2001:67c:208c::/128" :

https://chloe.sobornost.net/~job/slash-twelve.txt

Why didn't RPKI ROV stop this?
==

Creating RPKI ROAs and performing Route Origin Validation (ROV) on
received BGP route announcements helps protect against mishaps with
unauthorized "same-length" and "more-specific" announcements.

ROV (by design) does nothing against unauthorized "larger overlapping"
route announcements (such as 2000::/12). This is because the Internet's
global routing system is based on the Longest Prefix Match (LPM)
algorithm (see https://en.wikipedia.org/wiki/Longest_prefix_match)

LPM means that as long as your certified address space is in the global
routing table, a less-specific announcement (such as 2000::/12) is not
very likely to draw IP traffic away from your network.

In incidents like these the major impact seems to be that monitoring
systems are triggered (which is appropriate!). I suspect there is
virtually no impact to business operations (fortunately!).

Questions welcome!

Kind regards,

Job

-- 

To unsubscribe from this mailing list, get a password reminder, or change your 
subscription options, please visit: 
https://lists.ripe.net/mailman/listinfo/routing-wg

[routing-wg] RPKI Service Criticality Questionnaire

2022-06-27 Thread Job Snijders via routing-wg

Dear all,

RIPE NCC has asked the Routing WG Chairs to facilitate a working group
conversation on framing RIPE NCC's RPKI services subcomponents in terms
of criticality. 

At the bottom of this email is a form that focusses on three components:
confidentiality, integrity and availability. Each component is split
into three questions (a, b, and c), a total of 9 questions are being put
forward to the working group. We envision this process to be a public
consultation: WG participants can submit (free-form) responses, and also
chime in by replying to each other's responses; hopefully bringing us to
a degree of consensus in the coming weeks.

I believe this is an unique opportunity to help RIPE NCC! Investing our
time - in turn - will help ourselves rely on and integrate RIPE NCC's
RPKI services in our production environments. The goal is to help
RIPE NCC develop a deeper understanding of how the moving parts fit
together, which in turn helps decide where and how to invest resources.

>>> Your feedback is much appreciated! <<<

NOTE: if you are *NOT* a RIPE NCC member, and use the RIPE NCC Trust
Anchor (e.g. as Relying Party to make informed routing decisions, inside
and outside the RIPE region), your feedback *also* is much appreciated.

Kind regards,

Job, Ignas, Paul
Routing WG co-chairs

--- FORM STARTS BELOW ---

Service Criticality Questionnaire Form - RPKI
=

Introduction


This form is used to gather input from the community on the service
criticality of the RPKI Service from RIPE NCC. The framework is
detailed in: https://labs.ripe.net/author/razvano/service-criticality-framework/

The service criticality has three components:

* Confidentiality: What is the highest possible impact of a data
   confidentiality-related incident (e.g. data leak)?

* Integrity:   What is the highest possible impact of a data
   integrity-related incident (e.g. hacking)?

* Availability:What is the highest possible impact of a service
   availability-related incident (e.g. outage)? (All RIPE NCC
   services are designed with at least 99% availability, so
   please consider outages of up to 22 hours.)

Service purpose
---

The RIPE NCC RPKI Service is the RPKI Trust Anchor (TA) for the RIPE NCC
service region, comprised of:
* RPKI Dashboard (in the LIR portal)
* Repositories (rsync/RRDP)
* Certification Authorities (CAs)
* RPKI Management API
* Hardware Security Modules (HSMs)
* Datasets

Service Criticality
---

Please review the following three areas.

## (1) Global Routing

Incident Serverity
* Low(No / negligible impact)
* Medium (One or a few ASes are unavailable)
* High   (Many ASes in a region are unavailable)
* Very High  (Global Internet routing disruptions)

Please rate the incident serverity (Low to Very High) in the following
three areas. Please explain why.

(a) Confidentiality (Impact level of incidents such as data leaks)

Answer 1a:

(b) Integrity (Impact level of incidents such as hack attempts)

Answer 1b:

(c) Availability (Impact level of service outage incidents, up to 22 hours per 
quarter)

Answer 1c:

## (2) IP addresses and AS Numbers

Incident Serverity
* Low   (No / negligible impact)
* Medium(Local disruptions (registration information not being
 available for some entities))
* High  (Regional disruptions (registration information not being
 available for the RIPE NCC region))
* Very High (Global disruptions (lack of registration information
 for all AS Numbers and IP addresses))

Please rate the incident serverity (Low to Very High) in the following
three areas. Please explain why.

(a) Confidentiality (Impact level of incidents such as data leaks)

Answer 2a:

(b) Integrity (Impact level of incidents such as hack attempts)

Answer 2b:

(c) Availability (Impact level of service outage incidents, up to 22 hours per 
quarter)

Answer 2c:

## (3) Global DNS

Incident Severity
* Low   (No / negligible impact)
* Medium(Local disruptions)
* High  (Regional disruptions)
* Very High (Global disruptions)

Please rate the incident serverity (Low to Very High) in the following
three areas. Please explain why.

(a) Confidentiality (Impact level of incidents such as data leaks)

Answer 3a:

(b) Integrity (Impact level of incidents such as hack attempts)

Answer 3b:

(c) Availability (Impact level of service outage incidents, up to 22 hours per 
quarter)

Answer 3c:

 FORM ENDS --

-- 

To unsubscribe from this mailing list, get a password reminder, or change your 
subscription options, please visit: 
https://lists.ripe.net/mailman/listinfo/routing-wg

[routing-wg] state of RPKI-invalid objects in IRR databases (2022.05.16)

2022-05-16 Thread Job Snijders via routing-wg

Dear all,

On the #DENOG IRC channel I was asked for current stats on the number of
RPKI-invalid IRR route/route6 objects in various databases as follow-up
to a talk at RIPE81 [0]. I figured I should share this with the WG too.

Below is a table with today's stats of number of invalid route/route6
objects when one applies the RFC 6811 origin validation algorithm with
as input prefix value in the "route:" attribute and the origin AS in the
"origin:" attribute.

invalids  invalids
AFRINIC:   ipv4: 359  -  ipv6: 12  - authoritive
ALTDB: ipv4:   1  -  ipv6:191  - note 4
APNIC: ipv4:   21861  -  ipv6:   1880  - authoritive
ARIN:  ipv4: 814  -  ipv6: 65  - authoritive
BBOI:  ipv4:  44  -  ipv6:  1
BELL:  ipv4: 322  -  ipv6:  0
JPIRR: ipv4:  95  -  ipv6:  4
LACNIC:ipv4:   0  -  ipv6:  0  - authoritive (note 3)
LEVEL3:ipv4:   12925  -  ipv6:182
NTTCOM:ipv4:   65513  -  ipv6:730
RADB:  ipv4:  208901  -  ipv6:  12829
RGNET: ipv4:   2  -  ipv6:  0
RIPE:  ipv4:   28390  -  ipv6:   3518  - authoritive
RIPE-NONAUTH:  ipv4:   5  -  ipv6:  0  - note 5
TC:ipv4:   0  -  ipv6:  0  - note 2

Some notes on the above table:

1) ARIN-NONAUTH is not listed, ARIN deprecated this IRR source a month
   ago [2].
2) TC achieved a perfect 0/0 score by using the IRRd v4 RPKI integration
   [3].
3) LACNIC's IRR service is an information proxy for RPKI ROAs valid
   under the LACNIC Trust Anchor. This by definition means that all IRR
   objects in the LACNIC IRR database are RPKI-valid.
4) ALTDB periodically runs a script to delete RPKI-invalid objects
5) RIPE-NONAUTH imposes a two week delay before deleting RPKI-invalid
   objects, so the 5 IPv4 objects currently marked as invalid with
   disappear in the next few days, unless the covering RPKI ROAs are
   withdrawn before the timer expires.

The stats are generated by downloading the IRR database dump for each
source and running a simple python script [1].

Kind regards,

Job

[0]: https://ripe81.ripe.net/presentations/59-IRRd-RIPE812.pdf
[1]: https://github.com/job/irr-nonauth-cleanup
[2]: https://www.arin.net/announcements/20220128-irr/
[3]: https://irrd.readthedocs.io/en/stable/admins/rpki/

-- 

To unsubscribe from this mailing list, get a password reminder, or change your 
subscription options, please visit: 
https://lists.ripe.net/mailman/listinfo/routing-wg

[routing-wg] follow-up on RPKI open house: are CRLs used?

2022-05-04 Thread Job Snijders via routing-wg

Hi,

A question was raised during today's RPKI Data Open House: "are CRLs used?"

Just now I ran some statistics on today's RPKI state using the 5 RIR TALs:

* There are 30,914 CRL files.
* In totality, these 30K CRLs list revocations for 331,637 serials.
* 5,369 CRL files don't list revoked serials (at this point in time).
* On average, the non-empty CRLs list 12 revoked certs.

Further analysis can be performed using this JSON representation of the
global RPKI state:

https://console.rpki-client.org/dump.json (174 megabyte)
https://console.rpki-client.org/dump.json.gz (38 megabyte)

The dump.json file is regenerated a few times a day.

Kind regards,

Job

-- 

To unsubscribe from this mailing list, get a password reminder, or change your 
subscription options, please visit: 
https://lists.ripe.net/mailman/listinfo/routing-wg

[routing-wg] Fw: 2749 routes AT RISK - Re: TIMELY/IMPORTANT - Approximately 40 hours until potentially significant routing changes (re: Retirement of ARIN Non-Authenticated IRR scheduled for 4 April 2

2022-04-04 Thread Job Snijders via routing-wg

Hi all!

Sharing here as FYI. The impact of this event is very hard to
understand.

Aspect 1/ BGP routes might be impacted depending on *when* IRR mirror
operators remove the ARIN-NONAUTH from their list of sources (as far as
I understand ARIN will disable FTP/NRTM access). Commonly used mirror
operators are RADB, NTT, Lumen, and of course the local IRRd instances
operators might be running to speed up prefix-filter generation.
The 'peval' and 'bgpq3' utilities by default point at whois.radb.net,
the 'bgpq4' utility by default points at rr.ntt.net.

Aspect 2/ Overlapping (less-specific) objects might exist in other
databases, masking this event to some degree.

All timers in this IRR supply chain are unknown variables.

Below is a list of prefixes which *might* be affected, my hope is that
this list assists in troubleshooting efforts in the coming days / weeks.

Kind regards,

Job

- Forwarded message from Job Snijders  -

Date: Mon, 4 Apr 2022 17:56:45 +0200
From: Job Snijders 
To: na...@nanog.org
Subject: 2749 routes AT RISK - Re: TIMELY/IMPORTANT - Approximately 40 hours
until potentially significant routing changes (re: Retirement of ARIN
Non-Authenticated IRR scheduled for 4 April 2022)

Dear all,

On Sat, Apr 02, 2022 at 09:09:58PM +, John Curran wrote:
> As previously reported here, ARIN will be shutting down the
> ARIN-NONAUTH IRR database on Monday, 4 April 2022 at 12:00 PM ET.
> 
> It is quite likely that some network operators will see different
> route processing as a result of this change, as validation against the
> ARIN-NONAUTH IRR database will not longer be possible.
> 
> Please be aware of this upcoming event and make alternative
> arrangements if you are presently relying on upon routing objects in
> the ARIN-NONAUTH IRR database.

I ran an analysis just now in which I created an intersection between
prefixes observed in the BGP default-free zone and exactly matching
route:/route6: objects in ARIN-NONAUTH. I then substracted exact
matching objects found in the RADB, ALTDB, TC, NTTCOM, LEVEL3, RIPE, and
APNIC IRR sources. The result is a list of routes which might
experience service disruptions due to missing IRR objects.

The below 2,749 Prefix + Origin AS pairings are at risk as a result of
ARIN shutting down the ARIN-NONAUTH IRR database. Any potential effects
are likely to manifest themselves in the coming 24 - 32 hours. Prior to
this announcement, ARIN consulted with its community on the future of
its IRR service.

I voiced objection and raised concerns about (what appeared to be)
limited visibility into what exactly the effects of such a database
shutdown would mean for the Internet: 
https://lists.arin.net/pipermail/arin-consult/2021-March/001237.html
Other community members also shared concerns: 
https://lists.arin.net/pipermail/arin-consult/2021-February/001195.html
A number of graceful alternative mechanisms were proposed, but not
acted upon: https://lists.arin.net/pipermail/arin-consult/2021-March/001240.html

One might argue "well, folks had more than a year to move their
objects!", but on the other hand, it is entirely possible not all the
right people were reached, or in cases where affected parties did
receive a communication from ARIN, they perhaps were unable to
understand the message.

Please check if any of your prefixes are on the below list, and if so,
double check whether your IRR administration is able to overcome the
disappearance of ARIN-NONAUTH. Godspeed!

This tool might be useful: https://irrexplorer.nlnog.net/

Kind regards,

Job

Prefix OriginAS
100.42.100.0/24 33353
100.42.101.0/24 33353
100.42.102.0/24 33353
100.42.104.0/24 33353
100.42.105.0/24 33353
100.42.106.0/23 33353
100.42.108.0/24 33353
100.42.109.0/24 33353
100.42.96.0/23 33353
100.42.98.0/24 33353
103.11.202.0/24 33517
103.13.12.0/24 38057
103.13.135.0/24 51830
103.15.168.0/23 55532
103.196.22.0/23 7489
103.219.78.0/24 55256
103.219.79.0/24 55256
103.232.224.0/24 125
103.250.176.0/24 134795
103.250.177.0/24 134795
103.250.178.0/24 134795
103.250.179.0/24 134795
103.35.217.0/24 125
103.47.244.0/24 55256
103.88.172.0/24 136271
103.88.173.0/24 136271
103.88.174.0/24 136271
104.128.96.0/20 19233
104.142.128.0/23 33353
104.142.130.0/23 33353
104.142.136.0/22 33353
104.142.140.0/23 33353
104.142.144.0/24 33353
104.142.145.0/24 33353
104.142.146.0/24 33353
104.142.147.0/24 33353
104.142.148.0/24 33353
104.142.149.0/24 33353
104.142.152.0/24 33353
104.142.153.0/24 33353
104.142.156.0/24 33353
104.142.160.0/24 33353
104.142.164.0/24 33353
104.142.165.0/24 33353
104.142.175.0/24 33353
104.142.176.0/24 33353
104.142.177.0/24 33353
104.142.180.0/24 33353
104.142.181.0/24 33353
104.142.184.0/24 33353
104.142.185.0/24 33353
104.142.186.0/24 33353
104.142.187.0/24 33353
104.142.188.0/24 33353
104.142.189.0/24 33353
104.142.190.0/24 33353
104.142.191.0/24 33353
104.142.192.0/24 33353
104.142.224.0/24 33353
104.142.248.0/21 33353
104.142.249.0/24 33353
104.142.251.0/24 33353

Re: [routing-wg] RPKI vulnerable?

2022-02-18 Thread Job Snijders via routing-wg

Hi all,

It might be the case that the vulnerability is in the realm of disagreement
with some design choices of the past, rather than a traditional CVE hole in
one or more software packages.

I found the following paper which touches upon the “assumed trust” aspect
of RPKI in the relationship between Relaying Party and Trust Anchor(s).

https://www.researchgate.net/publication/349045074_Privacy_Preserving_and_Resilient_RPKI

I’m very interested in discussion about cross-signing schemes.

Kind regards,

Job
-- 

To unsubscribe from this mailing list, get a password reminder, or change your 
subscription options, please visit: 
https://lists.ripe.net/mailman/listinfo/routing-wg

Re: [routing-wg] RFO for RIPE NCC RPKI outage 16 February 2022

2022-02-16 Thread Job Snijders via routing-wg

On Wed, 16 Feb 2022 at 19:49, Rob Austein  wrote:

> On Wed, 16 Feb 2022 13:10:27 -0500, Job Snijders wrote:
> > On Wed, 16 Feb 2022 at 19:07, Randy Bush  wrote:
> >
> >  sra commented to me that, an rp doing protocol fall-over from rrdp to
> >  rsync, or vice versa, has to do the full download as the data structure
> >  is so different.  i.e. load spike
> >
> > Perhaps it doesn’t need to be a full load: “rsync ―compare-dest”
> > (against a previously downloaded and validated set of signed
> > objects) offers a path towards optimising the protocol fall-over.
>
> Even assuming the RRDP client stores and believes the rsync URIs in
> the RRDP data stream, and further assuming that the client is clever
> enough to write out its RRDP-derived database into a directory tree
> which exactly matches an rsync filesystem layout before failing over,

The OpenBSD RPKI validator does the above, while maintaining robust
cryptographic integrity (in version 7.6 and higher). I hope other
validators take inspiration from this, similar to how we (OpenBSD) took
inspiration from the Dragon Labs implementation. Your work lives on and on,
hat tip to you Rob! :-)

RRDP doesn't convey things like file modification dates that rsync
> needs to perform an efficient incremental transfer, so the first rsync
> pass is still going to be expensive.
>
> Not obvious to me that there's any good way to optimize this.  YMMV.
>

Ties once pointed me at the GPL rsync “-c” (checksum) option, which makes
transfers more focussed on content rather than filesystem attributes. From
my (openrsync) this is still work to be done. I see a path :-)

Kind regards,

Job
-- 

To unsubscribe from this mailing list, get a password reminder, or change your 
subscription options, please visit: 
https://lists.ripe.net/mailman/listinfo/routing-wg

Re: [routing-wg] RFO for RIPE NCC RPKI outage 16 February 2022

2022-02-16 Thread Job Snijders via routing-wg

On Wed, 16 Feb 2022 at 19:07, Randy Bush  wrote:

> thanks for the post mortem, ties.
>
> sra commented to me that, an rp doing protocol fall-over from rrdp to
> rsync, or vice versa, has to do the full download as the data structure
> is so different.  i.e. load spike

Perhaps it doesn’t need to be a full load: “rsync —compare-dest” (against a
previously downloaded and validated set of signed objects) offers a path
towards optimising the protocol fall-over.

Kind regards,

Job

>
-- 

To unsubscribe from this mailing list, get a password reminder, or change your 
subscription options, please visit: 
https://lists.ripe.net/mailman/listinfo/routing-wg

Re: [routing-wg] rsync://rpki.ripe.net rsyncd limits set too low?

2022-02-16 Thread Job Snijders via routing-wg

Hi Ties,

Thank you for the quick reply.

On Wed, Feb 16, 2022 at 03:32:06PM +0100, Ties de Kock wrote:
> Ouch. Fallback to rsync due to a DNS misconfiguration (which should
> have recovered).

Thanks for the confirmation. Indeed, my monitors seem to have returned
to 'all clear'.

> There are multiple instances behind a load-balancer. The current
> storage is on NFS which has a performance limitation - it peaked at
> about 80K operations/second (2m average).

Welp! That's a lot of IO.

Sharing from my own experience with a tiny publication point: I estimate
there are about 4,000 RPs deployed on the Internet. Assuming their
synchronisation attempts are evenly distributed across the hour, a
naieve calculation suggests every single second a new client will
attempt to connect.

> We will follow up with a more detailed post-mortem.

Much appreciated!

Kind regards,

Job

-- 

To unsubscribe from this mailing list, get a password reminder, or change your 
subscription options, please visit: 
https://lists.ripe.net/mailman/listinfo/routing-wg

Re: [routing-wg] rsync://rpki.ripe.net rsyncd limits set too low?

2022-02-16 Thread Job Snijders via routing-wg

On Wed, Feb 16, 2022 at 03:05:30PM +0100, Job Snijders wrote:
> However, it seems RIPE NCC adjusted the default rsyncd settings and
> lowered the concurrent connection count from 200 (which already is too
> low for RPKI Repository Servers) to 150?

Small correction: I appear to be confused about 200 being the default,
according to documentation the default is 'unlimited'

Kind regards,

Job

-- 

To unsubscribe from this mailing list, get a password reminder, or change your 
subscription options, please visit: 
https://lists.ripe.net/mailman/listinfo/routing-wg

[routing-wg] rsync://rpki.ripe.net rsyncd limits set too low?

2022-02-16 Thread Job Snijders via routing-wg

Hi all,

I noticed the RIPE NCC RRDP service (https://rrdp.ripe.net/) became
unreachable at 2022-02-16 13:34:10 UTC+0 (and still is down).

This RRDP outage event should not pose an issue for most RPKI
validators, because most RPKI cache implementations (which follow best
practises) will attempt to try to synchronize via RSYNC, in case RRDP is
unavailable.

However, it seems RIPE NCC adjusted the default rsyncd settings and
lowered the concurrent connection count from 200 (which already is too
low for RPKI Repository Servers) to 150?

$ rsync --no-motd -rt rsync://rpki.ripe.net/repository/
@ERROR: max connections (150) reached -- try again later
rsync error: error starting client-server protocol (code 5) at
main.c(1666)
[Receiver=3.1.2]

I'm not familiar with the RIPE RPKI RSYNC service architecture, so the
above error could be misleading: perhaps there is a loadbalancer
distributing TCP sessions across multiple backends, each backend
configured to serve up to 150 clients? Or perhaps there is a single
rsyncd instance (in which case 150 definitely is too low).

Is the RIPE NCC RPKI RSYNC service underprovisioned? If yes, why?

Kind regards,

Job

-- 

To unsubscribe from this mailing list, get a password reminder, or change your 
subscription options, please visit: 
https://lists.ripe.net/mailman/listinfo/routing-wg

Re: [routing-wg] Open-sourcing of the RIPE NCC’s RPKI core software

2022-02-09 Thread Job Snijders via routing-wg

Dear RIPE NCC RPKI team,

On Wed, Feb 09, 2022 at 10:26:14AM +0100, Bart Bakker wrote:
> We are pleased to announce that we have published the source code used
> by the RIPE NCC for the RPKI back-end (the RPKI core) under the
> 3-Clause BSD licence on Github: https://github.com/RIPE-NCC/rpki-core
> The RPKI core is the RIPE NCC's software for creating and maintaining
> RPKI objects based on the registry's current status and publishing
> these in the repositories.

Congratulations on this accomplishment and achieving this milestone!

https://sobornost.net/~job/clap.gif :-)

In the realm of cryptography, full transparency - unlimited and
unrestricted access to source code is a critical cornerstone for
building systems that can be relied upon.

> The RIPE NCC hosts the authoritative repository internally. We use the
> repository on Github to publish the source code externally. The first
> commit is identical to the source code in the RIPE NCC's internal
> repository at the time of that commit. The changes between releases
> are squashed and published to this repository on deployment, and the
> `main` branch reflects the code used by the production CA.

Am I right in assuming that - going forward - commits won't be squashed
(more than needed)? I imagine it'll be educational for the community to
be able to follow the train of thought and storyline of future
developments.

> We encountered several challenges while preparing this project for an
> open-source release. The main challenges were that the system uses
> proprietary elements that were part of the revision history and cannot
> be made public. Furthermore, it was not possible to review all
> historic commits. We plan to present our challenges while
> open-sourcing this project at RIPE 84.

I look forward to the stories.

Kind regards,

Job

-- 

To unsubscribe from this mailing list, get a password reminder, or change your 
subscription options, please visit: 
https://lists.ripe.net/mailman/listinfo/routing-wg

Re: [routing-wg] Penetration Test Report for RPKI

2021-12-21 Thread Job Snijders via routing-wg

On Tue, Dec 21, 2021 at 01:23:01PM -0800, Randy Bush wrote:
> > We hope you will find these reports useful
> 
> very much so.  thank you.

Yes, I'd like to echo what Randy says. Thanks for sharing this.

> btw, re RIPE-009 - Unencrypted Communication
> 
> in the up/down protocol, objects are cms wrapped and hence signed and
> objct authenticated; i.e. i would not panic about transport cia. 

Indeed. But I can imagine that in a world where virtually all
(originally HTTP-only yolo) APIs now have been migrated to HTTPS, any
API which ** by design ** is HTTP-only, would indeed stand out to
pentest researchers.

I think it is good the testers noticed this aspect, and also good that
RIPE NCC noted in the response "Up-down remains on HTTP and uses a CMS
wrapper for authentication."

The up/down protocol is somewhat similar in terms of security
considerations to how one can transport signed RPKI data from
Publication Point (repositories) to Relaying Party (validator
instances). In that context too, the use of unencrypted transport (like
RSYNC, or PIGEON) is deemed acceptable because the threat model is based
on a robust interpretation of object-security** to such an extend that
transport-security is inconsequential.

>  otoh, i suspect there could be a path to move your delegated CAs to
>  TLS; which might be conservative in the long run.

Would you mind elaborating on what you mean with the phrase "might be
conservative in the long run?".

Kind regards,

Job

** One crucial corner stone to the concept of 'RPKI object security' is
   a thing called "RPKI Manifests". Manifests are an elegant and very
   powerful idea in the X.509 universe: the ability to securely group 
   objects together. All modern validators use manifests: make sure your
   validator is updated to the latest version! Read more about what
   Manifests are here:
   https://datatracker.ietf.org/doc/html/draft-ietf-sidrops-6486bis-09
   This doc is now going through IETF last-call.

-- 

To unsubscribe from this mailing list, get a password reminder, or change your 
subscription options, please visit: 
https://lists.ripe.net/mailman/listinfo/routing-wg

[routing-wg] RPKI ROA MaxLength - feature or misfeature? (UX/security)

2021-12-10 Thread Job Snijders via routing-wg

Hi all,

I'm writing the working group to initiate some conversation about a
long-standing point of confusion in the RPKI ecosystem: the ROA
MaxLength field.

What is the ROA MaxLength field?


The data format profile of RPKI ROAs allows an operator to specify the
following fields:

* 1 (one) Origin AS
* one or more IPv4 or IPv6 prefixes
* for each IP prefix, a so-called 'MaxLength' value

Operators are allowed to create multiple ROAs with different Origin ASNs
covering the same prefix, folks can mix-and-match as needed. The
"MaxLength" feature essentially is a macro function (a 'shortcut'): when
you create a ROA with the following parameters:

Prefix: 2001:67c:208c::/48
MaxLength: 50
Origin AS: 15562

The above Prefix + Maxlength has the exact same meaning as:

Prefix: 2001:67c:208c::/48 or 2001:67c:208c::/49 or 
2001:067c:208c:8000::/49 or 2001:67c:208c::/50 or 2001:67c:208c:4000::/50 or 
2001:67c:208c:8000::/50 or 2001:67c:208c:c000::/50
Origin AS: 15562

The confusion & an UX experiment proposal
=

I suspect that many people think that "xxx/48 maxlength 50" means "the
/48, AND the four individual /50s" (mentally skipping over the
intermediate /49s). Going back as far as 2011 [1] - the concept of
"MaxLength" appeared less than straight-forward, the quest for a good
'default setting' seems a challenge. 

My experience at NTT taught me that encouraging customers to create IRR
"route:" or "route6:" objects that *exactly* match what people intend to
announce in the BGP plane, greatly simpifies things. Just register what
you want to announce, nothing more, nothing less.

A proposal for UX experiment: would it be beneficial to HIDE the
'maxlength' field (for some period of time) in the RPKI ROA management
system hosted by RIPE NCC? If the option isn't there, it can't confuse
people. Wouldn't it be better to encourage people to create ROAs that
align one-to-one with BGP announcements? (keep in mind: IRR route/route6
objects don't have the notion of maxlength).

Or an enhancement: a button "also create ROAs for all /24s and /48s, but
not the intermediate prefix lengths". This saves people a lot of
clicking if they want to prepare for maximum de-aggregation.

Is MaxLength used in the wild?
==

Only 15% of Validated ROA Payloads (VRPs) under the RIPE NCC Trust
Anchor have the MaxLength field set to something other than the
aggregate Prefix Length. 

I'm not entirely convinced that accommodating the 15% is worth the
hassle of explaining what the heck MaxLength is. Removing MaxLength from
the UI does not in any way impact anyone's ability to create as many
ROAs as they deem fit, it just forces people to be precise! :-)

Thoughts?

Kind regards,

Job

[1]: 
https://labs.ripe.net/author/alexband/using-the-maximum-length-option-in-roas/

-- 

To unsubscribe from this mailing list, get a password reminder, or change your 
subscription options, please visit: 
https://lists.ripe.net/mailman/listinfo/routing-wg

Re: [routing-wg] Code Audit Report for RPKI

2021-12-09 Thread Job Snijders via routing-wg

Dear Bart, RIPE NCC RPKI team,

On Fri, Dec 03, 2021 at 12:47:05PM +0100, Bart Bakker wrote:
> Continuing from the work we started last year on strengthening our
> security compliance, we have asked an external party to carry out a
> security audit of our RPKI code. This was an important element in
> preparation for open sourcing the RPKI core code, which will be done
> in early January 2022.

That is welcome news!

> We are publishing the security report for the second year in an effort
> to increase transparency and trust in the RPKI system. On our website
> [0], you will now find the code audit report written by Radically Open
> Security 2021 and our response to their findings.
> 
> We hope you will find these reports useful, and we look forward to
> your feedback.
> 
> [0] - 
> https://www.ripe.net/manage-ips-and-asns/resource-management/rpki/security-and-compliance

Thank you for sharing this. Both the audit report and the response to
the audit report seemed comprehensive and informative.

Out of curiosity, will RIPE NCC employ a different (new) auditor in
2022? Periodically changing auditors can potentially help increase the
diversity in terms of perspective on code and security. Each auditor
represents 'fresh eyes', a useful characteristic when dealing with
complex systems.

Kind regards,

Job

-- 

To unsubscribe from this mailing list, get a password reminder, or change your 
subscription options, please visit: 
https://lists.ripe.net/mailman/listinfo/routing-wg

Re: [routing-wg] Add BGPsec support to Hosted RPKI?

2021-10-11 Thread Job Snijders via routing-wg

On Mon, Oct 11, 2021 at 11:33:40AM +0200, Tim Bruijnzeels wrote:
> Why now?

There are published RFC and running code. Time for the next step.

> RIPE NCC may have substantial resources, but they are applied
> sequentially. Perhaps RIPE NCC can shed a light on the effort
> involved, but I suspect it's more than we might think. 

It is not clear to me what you mean with "more than we might think".
I think a standard PKCS#10 / PKCS#7 exchange is less involved than
implementing support for an entirely new Signed Object profile?

Additionally, to implement BGPsec support in the hosted environment, the
developers can take inspiration from prior-art. For ASPA no examples
exist yet.

> In that context, I am not against BGPSec as such, there are just
> things that I would like to see first:

Thank you for sharing your personal wishlist. The above 'context' seems
to depend on an assumption that work progresses sequentially.

> 1. Publication Service
> 
> I think this has immediate applicability to anyone considering running a 
> delegated
> CA. It's also in the interest of the ecosystem to limit the excessive growth 
> in the
> number of repositories.
>
> 2. ASPA
> 
> This is in draft status, but so were ROAs when the production system launched 
> in
> January 2011.

I'll with the working group a brief overview of where ASPA is, which
should help understand why ASPA probably is more of a 2022/2023 project.
I'm personally involved as co-author of the ASPA specification.

* ASPA is 2 drafts, 0 RFCs:
  https://datatracker.ietf.org/doc/search/?name=aspa=on=on
  This means that ASPA has not yet received review from the wider
  IETF community.

* As of yet no codepoints have been assigned to ASPA
  https://www.iana.org/assignments/rpki/rpki.xhtml

* No RPKI-to-Router protocol extension has been proposed for ASPA.

* At the IETF 111 SIDROPS meeting it was suggested to first
  construct a testbed before moving ASPA forward. See slids 11 and
  onwards: 
https://datatracker.ietf.org/meeting/111/materials/slides-111-sidrops-running-code-sidrops-00
  The testbed does not exist yet: https://github.com/SIDROPS/ASPA-testbed

* ASPA's running code status page still is empty
  https://trac.ietf.org/trac/sidrops/wiki ("AspaImplementations?" is a
  non-existing wiki page)

* Only a few months ago an issue was discovered in the ASPA
  verification algorithm in relationship to IX Route Servers. This
  has since been resolved (in -07 of the draft). To me it is an
  indicator that ASPA's specification still is in flux.

All in all ASPA undeniably is making progress, I would love for a
RPKI-based routing policy signaling mechanism to exist, but there is a
lot of work yet to be done.

I would suggest to start a discussion about adding ASPA to the RPKI
Quarterly planning as soon as passed at least "IETF Working Group Last
Call".

Kind regards,

Job

Re: [routing-wg] Support for "Publish in Parent" [RPKI RFC 8181]?

2021-10-07 Thread Job Snijders via routing-wg

On Thu, Oct 07, 2021 at 04:30:58PM +0200, Tim Bruijnzeels wrote:
> If this is added to the RIPE NCC RPKI backlog then I would also
> request that LIRs, and PI holders, can have multiple CAs publish at
> the RIPE NCC. The reason for this is that one of benefits of running a
> delegated CA lies in having the option to sub-delegate to child CAs.
> For example one can create child CAs with specific sub-sets of
> resources for departments, business units etc. To make this scale it
> would very beneficial if those children could publish under the
> publication server as well.

You make a good point.

Kind regards,

Job

Re: [routing-wg] Add BGPsec support to Hosted RPKI?

2021-10-06 Thread Job Snijders via routing-wg

On Wed, Oct 06, 2021 at 04:08:00PM +0200, Tim Bruijnzeels wrote:
> Contrary to Route Origin Validation (with ROAs) there is no 'not
> found' state. 

I don't think it is helpful to attempt to put BGPsec and ROAs in the
same equivalance class, draw parallels and then conclude that the
'not-found' state is something problematic that is lacking in BGPsec.
The concepts and designs of both technologies are very different.

> This means that if there is large scale issue with RPKI itself or your
> ability to validate RPKI data, BGPSec will end up saying your path is
> invalid. I think this is a rather scary property.

Indeed, BGPsec has a hard dependency on the RPKI being up and healthy.
This is unavoidable consequence of the design decision to make one
technology (BGPsec) dependent on another technology (the RPKI
framework).

The same of course applies to Route Origin Authorizations: if there is a
large scale issue with the RPKI, one's ability to work with given RPKI
data is impacted. I think the RIRs and NIRs are increasingly
understanding that their RPKI services are expected to perform
flawlessly. Great operational discipline is expected from Trust Anchors.
(this applies to the TLS WebPKI too).

At the end of the day, BGPsec (and RPKI) will not fancy everyone or be
applicable for every possible situation, that's OK.

> @2- incremental deployment is hard
>
> BGPSec validation can only result in 'valid' if ALL ASNs on the path
> sign. Until that time the path will be 'invalid'. So BGPSec validation
> can only really be turned on after this point has been reached, and
> until this point has been reached there is no benefit and therefore no
> incentive to operators to buy BGP hardware that supports BGPSec, and
> publish their router keys as BGPSec certificates.

In practise the characteristic that you describe means that BGPsec
deployment can happen incrementally on (for example) private peering
between two companies. Indeed, not on 'full table transit' sessions.
For example, in at large-scale cloud provider to cloud provider peering
sessions, there often times are no downstream ASNs to be seen on either
side. The traffic volumes are high, the number of routes on each side
fairly low.

As BGPsec-signed paths cannot traverse non-BGPsec topology, partial
BGPsec deployment forms islands of assured paths. As islands grow to
touch each other, they become larger islands.

To do incremental deployment, both sides simply need to agree to use
BGPsec, and not permit non-BGPsec sessions to establish at the
particular intersection point. Keepin mind that a possible solution to
prevent 'downgrade attacks', is to not tolerate downgrades...

An analogy: I don't think anyone is expecting a BGP session to establish
if there is a mismatch in TCP-MD5, TCP-AO, or IPsec configuration
between two peers. The goal is for sessions NOT to establish if the
password is wrong.

> Because of the above I don't think that adding BGPSec support in the
> hosted interface will help. Don't get me wrong.. I would *love* for
> BGPSec to succeed. 

I believe the path to success starts with actually making the technology
available to increasingly larger groups of people. Literally making it
"as hard as possible" to deploy BGPsec (aka, "maintaining the status
quo"), will unsurprisingly lead to BGPsec 'not succeeding'. I don't know
what the future holds and whether BGPsec will 'succeed', but I do know
there is only one way to find out: making an honest effort to make it
work.

[ anecdote: I remember that in the early days of IPv6 it was quite hard
to get IPv6 blocks in the RIPE region. To receive an IPv6 PI block, you
had to be BGP multi-homed. This requirement did not exist for IPv4 PI
space. Consequently, many people continued to request IPv4 PI blocks not
spending any time on IPv6, because the RIR wouldn't give them IPv6 space
to deploy. ]

> I would like to be proven wrong in my interpretation. But as it stands
> I think a fundamental discussion is needed (in the IETF as well) on
> how it can be made incrementally deployable - such that there is
> benefit to early adopters - and get a safe landing in case of errors.
> If this can be achieved, or if someone can explain how this is already
> achieved, then I would be much less skeptical.

I don't know what your interpretation is based on, we clearly lack
common experience and perspective on BGP routing.

As for 'safe landing' (a nice sounding phrase), but in DNSsec there are
no safe landings either. It is possible to productively use and operate
systems in which the 'safe landing' would be to disable the system
entirely.

I recommend everyone to read https://datatracker.ietf.org/doc/html/rfc8374
and https://datatracker.ietf.org/doc/html/rfc8207 to get a feel for why
some choices were made and what gotcha's exist.

Kind regards,

Job

Re: [routing-wg] Add BGPsec support to Hosted RPKI?

2021-10-05 Thread Job Snijders via routing-wg

On Mon, Oct 04, 2021 at 11:48:12PM +0330, Ehsan Ghazizadeh wrote:
> Its an old doc worth reading.

You are offering the working group information from 2009. The same year
"Call of Duty: Modern Warfare 2" was released.

Since then, a number of IETF-consensus documents have been published.
For example the BGPsec specification itself. Here is a timeline:

Feb 2014, RFC 7132 - Threat Model for BGP Path Security 
Aug 2014, RFC 7353 - Security Requirements for BGP Path Validation
Sep 2017, RFC 8205 - BGPsec Protocol Specification
Sep 2017, RFC 8206 - BGPsec Considerations for Autonomous System (AS) 
Migration
Sep 2017, RFC 8207 - BGPsec Operational Considerations
Sep 2017, RFC 8208 - BGPsec Algorithms, Key Formats, and Signature Formats
Sep 2017, RFC 8209 - A Profile for BGPsec Router Certificates, Certificate 
Revocation Lists, and Certification Requests
Apr 2018, RFC 8374 - BGPsec Design Choices and Summary of Supporting 
Discussions
Jun 2019, RFC 8608 - BGPsec Algorithms, Key Formats, and Signature Formats
Aug 2019, RFC 8634 - BGPsec Router Certificate Rollover
Aug 2019, RFC 8635 - Router Keying for BGPsec

If at this point there still are undocumented gotcha's, they aren't
gonna be found in a vacuum. Lowering barriers (by for example making it
easier to manage BGPsec in the RPKI dashboard) will increase the number
of people able to take a look at BGPsec, and subsequently improve the
technology.

Kind regards,

Job

Re: [routing-wg] Add BGPsec support to Hosted RPKI?

2021-10-04 Thread Job Snijders via routing-wg

Hi Ehsan, working group,

On Mon, 4 Oct 2021 at 14:17, Ehsan Ghazizadeh  wrote:

> As far as i know, no vendor supports bgpsec, so what's the point of adding
> bgpsec support to hosted rpki?
>

There already are multiple RPKI validators which support BGPsec, multiple
signers, and multiple BGPsec-capable BGP implementations. Whether one likes
the currently available choices is of course a somewhat subjective matter.
:-)

BGPsec - at present - definitely isn’t the operators “go to” tool; but the
specification has been published via the IETF RFC standards track, received
significant scrutiny, and multiple independent implementations have been
produced. It takes a lot of community effort to go from 0 to 1, and from 1
to 100.

I think adding BGPsec support to hosted RPKI management dashboards might
help make BGPsec more mainstream, in turn increasing demand for additional
(commercial off the shelf) implementations. The effects of obstacles to
deployment often appear to compound.

also cause of encryption/decryption process via async encryption method,
> it's a resource intensive process so not all routers are able to handle it,
> also the more important part is bgpsec changes the normal behavior of bgp,
> for instance, update packing (update group) will be disabled.
>

Indeed, it is always important to use equipment suitable for the job at
hand. It might make sense to keep an eye out for BGP routers with AVX512
support in their CPU, rather than attempting to retrofit this type of tech
onto 32-bit PowerPC based platforms. :-)

Are we just discussing the support of bgpsec certs on hosted rpki, and we
> would discuss bgpsec deployment impacts and open issues later?
>

I believe the current discussion is about the first aspect. But I love and
welcome dialogue on deployment impact and any open issues (so the community
can work on addressing each and every issue)!

Evaluating and (potentially) deploying BGPsec in production environments is
a multi-year project, just like RPKI-based BGP Origin Validation was.

Kind regards,

Job

>

[routing-wg] RPKI planning @ RIPE (Was: Support for "Publish in Parent" [RPKI RFC 8181]?)

2021-09-20 Thread Job Snijders via routing-wg

Dear Nathalie, group,

On Mon, Sep 20, 2021 at 03:11:22PM +0200, Nathalie Trenaman wrote:
> Please be aware that the roadmap you mentioned just shows the roadmap
> for the current quarter and not for a longer period. 

Ah, thank you for the clarification.

Are there any other items that predate the existence of the "Community
Input on Planning" table on [1]?

As some RPKI projects are multi-quarter or even multi-year projects,
it might be good to expand the community's visibility into the list of
RPKI-related work items which RIPE NCC to some degree has accepted, but
not yet scheduled.

Kind regards,

Job

[1]: 
https://www.ripe.net/manage-ips-and-asns/resource-management/rpki/rpki-planning-and-roadmap

[routing-wg] Support for "Publish in Parent" [RPKI RFC 8181]?

2021-09-20 Thread Job Snijders via routing-wg

Hi working group,

In recent mail threads the concepts of "Hosted RPKI" and "Delegated
RPKI" came up, but as mentioned by Tim and Rubens, another flavor also
exists! A "hybrid" between Delegated and Hosted, informally known as
"publish in parent" (aka RFC 8181 compliant Publication Services).

There are multiple benefits to the general RPKI ecosystem when RIRs and
NIRs support RFC 8181:

* Resource Holders are relieved from the responsibility to operate
  always online RSYNC and RRDP servers.

* Reducing the number of Publication servers reduces overall
  resource consumption for Relying Parties. Consolidation of
  Publication Servers improves efficiency and is generally
  considered advantageous.

* Helps avoid "reinventing the wheel": it might be better to have a
  small group of experts build a globally performant and resillient
  infrastructure that serves everyone, rather than everyone building
  the 'same' infrastructure.

Other RIRs and NIRs are also working on RFC 8181 support. RFC 8181 is
relatively new so it'll take some time before we see universal
availability.

NIC.BR (available): https://registro.br/tecnologia/numeracao/rpki/
APNIC (available): 
https://blog.apnic.net/2020/11/20/apnic-now-supports-rfc-aligned-publish-in-parent-self-hosted-rpki/
ARIN (planned): 
https://www.arin.net/participate/community/acsp/suggestions/2020/2020-1/

Is implementing RFC 8181 support something RIPE NCC should add to the
https://www.ripe.net/manage-ips-and-asns/resource-management/rpki/rpki-planning-and-roadmap
 ?

What do others think?

Kind regards,

Job

Relevant documentation:
https://datatracker.ietf.org/doc/html/rfc8181

Re: [routing-wg] Add BGPsec support to Hosted RPKI?

2021-09-20 Thread Job Snijders via routing-wg

Hi Rubens, others,

On Sun, Sep 19, 2021 at 08:06:54PM -0300, Rubens Kuhl wrote:
> Our experience in Brazil is that delegated RPKI is not much of an
> issue provided its software deployment is easy enough. Krill + Lagosta
> + Up/Down activation + Upwards ROA publishing adds to being really
> effective.

Good to hear from Brazil! Indeed a number of organizations have worked
hard to remove as many barriers as possible towards the Delegated RPKI.
Impressive progress.

My message was not intended to take away from "Delegated RPKI"
deployments (I run one myself!), but rather to suggest that the
"Hosted RPKI" Dashboard should _also_ make it possible to certify &
publish BGPsec Router keys.

I suspect that "Hosted RPKI" will always be popular: clearly many
operators feel comfortable outsourcing the issuance & publication of
their ROAs to the RIR. I think it is important to study feature gaps
between "Hosted" and "Delegated".

Kind regards,

Job

ps. A: What about ASPA? Q: why not both? :-) I'm working to start an
IANA "Early Allocation" procedure to obtain codepoints for ASPA. When
progress has been made I'll circle back to routing-wg@ in a new thread,
unless someone beats me to it. :-)

[routing-wg] Add BGPsec support to Hosted RPKI?

2021-09-19 Thread Job Snijders via routing-wg

Dear all,

[ TL;DR: What does the working group think about supporting an extension
 to the RPKI Dashboard to enable publication of BGPsec certs? ]

At the moment the hosted "RPKI Dashboard" at https://my.ripe.net/#/rpki,
only permits Resource Holders to create RPKI objects of one specific
type: ROAs. However, a wider range of RPKI cryptographic product types
also exists, for example: BGPsec Router Certificates [RFC 8209].

BGPsec is a RPKI-based technology which enables network operators to
transitively validate whether a given BGP UPDATE - indeed - passed
through the Autonomous Systems listed in the path. One way to think of
BGPsec is as an ECDSA protected network of channels between a receiving
EBGP node; and one (or many) routers in the BGP route's Origin AS.

I think BGPsec can be useful to protect "private peering" at large
scale, and another use case is to increase confidence in routing
information distributed via IXP Route/Blackhole Servers.

Right now, routing protocol researchers and network operators wishing to
publish BGPsec Router Keys, also have to learn how to master "Delegated
RPKI": a deployment model with a steep learning curve. I think there are
benefits to the community if RIPE NCC appends an activity to the "RPKI
Planning and Roadmap" to implement procedures to sign and publish BGPsec
Router Keys via a PKCS#10 / PKCS#7 exchange, callable via both API and
dashboard WebUI.

What do others think?

Kind regards,

Job

Relevant documentation:
https://datatracker.ietf.org/doc/html/rfc8209
https://datatracker.ietf.org/doc/html/rfc8635

Re: [routing-wg] request for feedback: a RPKI Certificate Transparency project?

2021-09-10 Thread Job Snijders via routing-wg

Hi Tim,

> But this should start with a problem statement which is discussed in
> the IETF. The context of the RPKI standards matter and a lot of the
> contributors to those standards are not active here. 

It is not uncommon for initiatives to start in a special interest group
outside the IETF, and then later on be presented to the appropriate IETF
working group.

For example the origins of the development of BGP Large Communities can
be traced back to a NetNod meeting [1], later on the design was
influenced based on feedback received at Routing WG @ RIPE 72, and
then finally the specification was published as RFC via the IETF IDR WG.

This message [2] is intended to start a conversation in the RIPE
community specifically about the topic of Certificate Transparency and
RPKI, because CT appears to have critically improved the WebPKI.

> As it stands I think that asking the RIPE NCC to make a big investment
> without further analysis is questionable. 

I agree, more study is needed before committing to big investments.
Gauging community interest is part of the exploratory phase of the
process.

> It is also not sufficiently clear to me how and why this problem is
> more urgent than other investments in RPKI, 

I don't recall anyone suggesting this is "more urgent than other
investments"?

> e.g. providing a Publication Server service for members, and investing
> in support for ASPA.

RIPE NCC maintains a list of plans here [4]. Neither Publication Server
service nor ASPA are listed as of yet. Specific to about ASPA: as per
last IETF 111 SIDROPS meeting [3], I think ASPA is pending the
development of a testbed between various vendors coordinated through
that IETF working group. It'll depend on market forces at what pace ASPA
moves along.

And do keep in mind that deployment of ASPA would mean we (network
operators) collectively even more increase our dependency on the RPKI,
which in my opinion strengthens the case to talk about additional
oversight and auditability of Trust Anchors ... perhaps through
Certificate Transparency!

Kind regards,

Job

[1]: http://largebgpcommunities.net/2016/where-did-large-communities-start/
[2]: 
https://www.ripe.net/ripe/mail/archives/routing-wg/2021-September/004397.html
[3]: https://www.youtube.com/watch?v=DtnFulym8CQ
[4]: 
https://www.ripe.net/manage-ips-and-asns/resource-management/rpki/rpki-planning-and-roadmap

Re: [routing-wg] request for feedback: a RPKI Certificate Transparency project?

2021-09-10 Thread Job Snijders via routing-wg

On Fri, Sep 10, 2021 at 11:39:39AM +0200, Tim Bruijnzeels wrote:
> I think all would agree that transparency is good.
> 
> A key difference between RPKI and most other PKIs is that in the RPKI
> all objects are published in the open for all the see. 

Small nitpick: all objects are SUPPOSED to be published, in the open,
for all to see. However it is important to keep in mind we cannot assume
all objects were published in a way for all to see.

> As you mentioned your RPKI validator may miss intermediate state
> changes if it retrieves objects using rsync, but the RRDP protocol
> supports deltas, see [1].
> 
> I believe that transparency can most easily be achieved by ensuring
> that these deltas are preserved, and that they cannot be modified.

RRDP is an unauthenticated and unsigned protocol. It is possible for a
Publication Point to present different RRDP deltas to one RP compared to
what they present to another RP. Archiving RRDP deltas is interesting,
but IMHO happens too late in the pipeline for TA/CA audit purposes.

RRDP is not a replacement for Certificate Transparency, both
technologies solve different problems.

Kind regards,

Job

[routing-wg] request for feedback: a RPKI Certificate Transparency project?

2021-09-09 Thread Job Snijders via routing-wg

Dear all,

With summer turning to fall in the Northern Hemisphere, yet again a new
schoolyear is ahead of us! :-) I hope you all are well.

I'm writing the group to solicit feedback for me and others to consider
during upcoming deliberations about activity plans, but even more so as
an RPKI enthusiast who is curious to learn what others see as potential
future evolutions of the RPKI technology stack.

[ TL;DR - Ask to the routing community: is there interest to coordinate
  and support an industry-wide project to introduce the principles of
  "Certificate Transparency" to the RPKI? The project size could be
  substantial, but so are the upsides. ]

Intro: global deployment & operation of the RPKI is a multi-decade project
==

Over the last 21 years this industry has collectively helped grow and
nurture 'Secure BGP' [1] into the RPKI/BGP deployment as we know it
today: the smallest and largest of networks in the Default-Free Zone
core are anchoring their BGP routing decisions to a RPKI covering 31% of
space, which in turn helps connect billions of End Users to the
Internet.

>From my personal perspective [10], the RPKI has now reached some level
of maturity. Perhaps now is the time for some of our community's focus
to shift towards designing and implementing innovations on top of the
current RPKI, without jeopardizing its current plateau of stability.

What does trusting a Trust Anchor mean?
===

Some people have (correctly!) pointed out that RPKI Trust Anchor (TA)
operators technically can issue certificates related to any Internet
Number Resource, a consequence of some people considering "all
resources" [5] being subordinate a necessity for day-to-day TA
operations. While I am aware of some minor concerns about the "all
resources" framework (and I personally see room for improvement!), for
me the big question is not "who do I trust?", but "what did they
actually do after I started trusting them?".

In this reality where RIRs can sign "everything" and I (as RP operator) can
cryptographically verify that what I observed through periodic polling
[6] was indeed signed by my locally configured Trust Anchor(s)... one
thing seems to be missing! I don't know anything about what my RP didn't
observe! :-) Perhaps some certificates were issued and very quickly
revoked concerning subordinate Internet Number Resources of great
importance to me? How would I know if I didn't see it myself?

I don't expect to trust Trust Anchor operators to never make any
mistakes, but I do wish to be in a position where I can assess past
performance, and can compare third-party audit logs, to inform my future
decisions! To me it seems important to increase our collective
visibility into TA/CA takes & mistakes. ("Mistakes" meaning the issuance
or revocation of certificates non-compliant with the policy outlined in
RIPE-751).

Most Internet Routing incidents are analyzed after-the-fact through the
use of Route Views [8], RIPE RIS [9], or information viewplayers like
BGPlay. Everyone being enabled to "scrub back in time" greatly enhances
our group's ability to understand what transpired and how to prevent it
going forward.

What is the RPKI equivalent of BGPlay at a cryptographicly auditable
level of detail? ... maybe Certificate Transparency? [7].

Copying good ideas from other PKIs: Certificate Transparency


The RPKI is built on top of X.509 and CMS tech. Any developments in
other X.509 special interest communities (such as WebPKI [2], aka "the
https:// experts"), may be amazing ideas or methods worth copying into
'our Internet Number Resource PKI' ecosystem. 

One of the inventors [3] of public-key cryptography (a core concept in
the RPKI), also came up with an idea known as "Merkle Trees" [4]. This
concept can be used to construct inter-domain "append-only" logging
facilities, which can be incredibly useful to help increase trust in a
Trust Anchor in an "assumed trust" model. I'll try to explain why below!

A key concept in Certificate Transparency is that a CA ('the signer')
- ahead of time - shares with select third parties (so-called 'CT Logs')
their commitment to sign a given digital object. After acknowledgement
from the CT Log(s), the signer proceeds to sign and publish the RPKI
object. The CT Logs use Merkle Trees to allow external auditors to
'losslessly replay' all observations of certificate issuance from a
given CT Log, and compare CT Logs with each other.

Implementation of Certificate Transparency would provide this community
with something analogous to the RIPE Database "Historical Queries". The
major difference being all logged data comes with cryptographic
assurances, and the data can be hosted and audited by both RIPE NCC and
any third parties (anyone with Internet access!).

RIPE NCC sending precertificate information to CT Logs?

Re: [routing-wg] RPKI Quarterly Planning

2021-07-13 Thread Job Snijders via routing-wg

On Tue, Jul 13, 2021 at 05:25:11AM +0200, Daniel Karrenberg wrote:
> It might also be that the operational community has chosen other fora to
> discuss because this working group is not working.

What a strange thing to say. Of course there are other fora to discuss
RPKI, one of the most important ones is IETF's SIDROPS working group
(which is quite active!). 

As for the road map - RIPE NCC indicated feedback can be shared with the
routing-wg@ or with r...@ripe.net. I myself opted to try the latter
route to re-iterate a request for publish dashboards and graphs about
the RPKI service which resulted in 'RPKI-2021-#01' being added to the
roadmap.

The motivation behind RPKI-2021-#01 is that many IXPs offer publicly
accessible graphs ala:

https://www.ams-ix.net/ams/documentation/total-stats
https://portal.linx.net/
https://www.jpnap.net/ix/traffic.html
https://www.netnod.se/ix/statistics
https://de-cix.net/en/locations/frankfurt/statistics

When incidents happen, these graphs enable the IX participants to
quickly understand whether 'something is wrong', because humans are
really good at pattern recognition.

I imagine that developing more insight into the RIPE NCC RPKI service
will offer the community similar benefits as what the community gleans
from these public IX stats, hence the ask for RPKI-2021-#01.

Kind regards,

Job

Re: [routing-wg] RPKI Quarterly Planning

2021-07-13 Thread Job Snijders via routing-wg

Hi,

On Mon, Jul 12, 2021 at 10:23:20AM +0200, Daniel Karrenberg wrote:
> Natanlie pointed us to
> https://www.ripe.net/manage-ips-and-asns/resource-management/rpki/rpki-planning-and-roadmap
> a while ago. Among other things this says:
> 
> “In preparation for the improved RPKI repository architecture, the
> distributed nature of the RRDP repository is going to be implemented using
> containers and krill-sync that pulls data from the centralised on-premise
> repository. This greatly simplifies smooth transitioning between publication
> servers without any downtime.
> 
> NOTE: We are not referring to cloud technologies here, just to our internal
> deployment technologies.”
> 
> The silence here worries me.

What silence?!

Over the last few months there have been quite some mail threads in this
working group about RPKI and RPKI outage incidents, and NCC staff have
provided updates during the virtual RIPE meetings in the Routing WG
slot.

To me the roadmap seems to reflect the sentiment that reliability is the
key objective at this moment in time.

> I would like to see some feedback from this group whether this is what
> you want to see happening. The RIPE Routing WG is the forum for giving
> guidance to the RIPE NCC about RPKI. I know other channels exist too
> and that is fine. I also know that individuals here seem to be happy
> with what is happening. However private channels and conversations are
> not the way RIPE does this.  This group is where the RIPE NCC looks
> for guidance and where that guidance gets properly archived and
> responded to.

To be honest I am not sure what the purpose of krill-sync is.

In May 2021 [1] extensive testing was conducted with the help of the
NLNOG RING to see if krill-sync could be used to power the RSYNC
service, but it turned out there were multiple issues with krill-sync
making it a suboptimal choice. I believe RIPE NCC ended up deploying a
different solution to serve RSYNC - and my hope is that the
recently-achieved stability is here to stay, because the current setup
seems to work quite nicely.

As for 'hidden RRDP' master, I fail to see what the benefits of
krill-sync are compared to say Varnish [2] (or Squid [3]). Or what
already is achieved by using a CDN to deliver the RRDP deltas.  Maybe
the krill-sync reference is an outdated comment?

Kind regards,

Job

[1]: https://www.ripe.net/ripe/mail/archives/routing-wg/2021-May/004345.html
[2]: https://varnish-cache.org/
[3]: http://www.squid-cache.org/

Re: [routing-wg] request to enable ICMP echo-reply on rpki.ripe.net?

2021-05-07 Thread Job Snijders via routing-wg

On Fri, May 07, 2021 at 03:29:44PM +0200, Nathalie Trenaman wrote:
> Our ops team just enabled ICMP echo-reply on rpki.ripe.net.

Thank you. Have a good weekend!

Kind regards,

Job

Re: [routing-wg] request to enable ICMP echo-reply on rpki.ripe.net?

2021-05-05 Thread Job Snijders via routing-wg

On Wed, May 05, 2021 at 12:52:51PM +0200, Kurt Kayser wrote:
> you surely know that every enabled protocol/port is a potential threat.

Sometimes disabling a protocol or port is a potential threat (because
hindering troubleshooting efforts harms network stability).

RIPE NCC is the only RIR that does not respond to ICMP Echo Requests on
their main RPKI service.

Kind regards,

Job

$ ping -c 1 rpki.afrinic.net
PING rpki.afrinic.net (196.216.2.26): 56 data bytes
64 bytes from 196.216.2.26: icmp_seq=0 ttl=48 time=183.631 ms

--- rpki.afrinic.net ping statistics ---
1 packets transmitted, 1 packets received, 0.0% packet loss
round-trip min/avg/max/std-dev = 183.631/183.631/183.631/0.000 ms

$ ping -c 1 rpki.apnic.net
PING rpki.apnic.net (203.119.101.18): 56 data bytes
64 bytes from 203.119.101.18: icmp_seq=0 ttl=240 time=315.433 ms

--- rpki.apnic.net ping statistics ---
1 packets transmitted, 1 packets received, 0.0% packet loss
round-trip min/avg/max/std-dev = 315.433/315.433/315.433/0.000 ms

$ ping -c 1 rpki.lacnic.net
PING rpki.lacnic.net (200.3.14.185): 56 data bytes
64 bytes from 200.3.14.185: icmp_seq=0 ttl=49 time=204.922 ms

--- rpki.lacnic.net ping statistics ---
1 packets transmitted, 1 packets received, 0.0% packet loss
round-trip min/avg/max/std-dev = 204.922/204.922/204.922/0.000 ms

$ ping -c 1 rpki.arin.net
ping: Warning: rpki.arin.net has multiple addresses; using 199.71.0.150
PING rpki.arin.net (199.71.0.150): 56 data bytes
64 bytes from 199.71.0.150: icmp_seq=0 ttl=51 time=152.630 ms

--- rpki.arin.net ping statistics ---
1 packets transmitted, 1 packets received, 0.0% packet loss
round-trip min/avg/max/std-dev = 152.630/152.630/152.630/0.000 ms

[routing-wg] request to enable ICMP echo-reply on rpki.ripe.net?

2021-05-05 Thread Job Snijders via routing-wg

Hi RIPE NCC, hi all,

In today's troubleshooting adventure, an operator experienced difficulty
pinpointing where exactly a connectivity issue between them and
rpki.ripe.net (193.0.6.138 + 2001:67c:2e8:22::c100:68a) resided.

It would be helpful if RIPE NCC reverted disabling responding to ICMP
echo requests originating from the Internet. Would it be possible to
adjust the firewall settings to accomodate troubleshooting and
monitoring?

Right now connectivity testing has to be performed directly against the
rsync daemon's internet-exposed TCP port (873) - but it would be much
cheaper and faster for both the tester and the service hoster if instead
ICMP echo requests could be used as an early warning system (rather than
the rsync service itself).

$ ping -c 6 rpki.ripe.net
PING rpki.ripe.net (193.0.6.138): 56 data bytes

--- rpki.ripe.net ping statistics ---
6 packets transmitted, 0 packets received, 100.0% packet loss

The above test result differs compared to sending echo requests to
molamola.ripe.net or manus.authdns.ripe.net.

Kind regards,

Job

Re: [routing-wg] TC x IRRd 4.2

2021-04-28 Thread Job Snijders via routing-wg

Dear Rubens, all,

On Tue, Apr 27, 2021 at 10:18:32PM -0300, Rubens Kuhl wrote:
> TC IRR, an IRR operator focused on Brazilian networks, just changed to
> IRRd 4.2. The new version allowed TC to deploy RPKI validation
> (thanks NTT for sponsoring that development) and expose HTTPS
> endpoints for WHOIS and submission that we hope will foster innovation
> around the database.
> 
> Every precaution was taken for this migration to be seamless for other
> IRR operators, including matching of serial numbers. Every IRR server
> that mirrored TC and supported -j status query was verified that it
> followed and still correctly follows database journals.
> 
> But if anything appears broken, please let me know or e-mail
> db-ad...@bgp.net.br.

Congratulations to you and the TC team for reaching this milestone!

TC's use of RPKI-based IRR Object filtering combined with the efforts of
NIC.BR, IX.br, and LACNIC to promote RPKI in Brazil, make the Brazilian
community a positive example of a seamless integration between IRR and RPKI.

Thank you for your efforts to increase the data quality of the TC registry.

Kind regards,

Job

[routing-wg] How BGP routes can get 'stuck' in the Default-Free Zone

2021-04-21 Thread Job Snijders via routing-wg

Dear group,

I'd like to draw your attention to an excellent article on an intricate
interaction between BGP and TCP which can result in 'zombie routes' in
the BGP Default-Free Zone.

https://blog.benjojo.co.uk/post/bgp-stuck-routes-tcp-zero-window

My current running theory on the root cause of some mishaps in the
global routing system is that certain BGP implementations can end up in
a broken state where such systems will still generate and send out
KEEPALIVE messages, but are unable to process other BGP messages (and
such a system instructs all its peers to not send new data by signalling
a zero TCP receive window). This is "Problem #1".

"Problem #2" is that almost all BGP implementations are unable to
robustly deal with systems suffering from Problem #1. Allmost all BGP
implementations assume that when KEEPALIVE messages don't make it across
the wire, the remote system will initiate the session tear down. But of
course, if the remote system is in such a broken state that it can't
issue session tear downs ... the combined system state is perpetually
broken.

The Security Section of 
https://datatracker.ietf.org/doc/html/draft-spaghetti-idr-bgp-sendholdtimer
elaborates on three detrimental facets of the above situation.

It is quite rare for systems to end up in the "Problem #1" state, but
when it happens, all systems connected to the broken node probably are
better off disconnecting from such a system than perpetually forwarding
(and potentially blackholing) Internet traffic into the broken system.

Kind regards,

Job

Re: [routing-wg] Issue affecting rsync RPKI repository fetching

2021-04-15 Thread Job Snijders via routing-wg

Dear Ties, group,

Thank you for the outline.

On Wed, Apr 14, 2021 at 02:33:37PM +0200, Ties de Kock wrote:
> The RPKI application does not support writing the complete repository to disk
> for each state (as needed for spooling the repository as proposed in scripts).
> Synchronously writing every state of the repository to disk is not feasible,
> given our update frequency and repository size. Functionality for
> asynchronously writing the repository to disk needs to be developed. We have 
> two
> paths to develop this:
> - The first is a new daemon that writes to disk from the database state at a 
> set interval.
> - The second one is using RRDP as a source of truth and writing the 
> repository to disk.
> Furthermore, we would need to migrate the storage from NFS to have faster 
> writes.
> 
> Both approaches need an extended period for validation and we are not able to
> deploy these within a few weeks. The latter approach (using RRDP) has less 
> risk
> and is the option we are aiming for at the moment. We plan to release the new
> publication infrastructure in Q2/Q3 2021 and hope to migrate earlier.

The "RRDP as source of truth" approach indeed seems the more appealing
(and simpler!) option. I would encourage the NCC to follow that path.

In the mean time, can 
https://www.ripe.net/support/service-announcements/service-announcements/current
be updated to reflect that there are known race conditions and problems
with the RIPE NCC RSYNC service?

Are there any other tweaks the NCC can think of that reduce the
operational pain? Maybe increasing the publication interval?

Kind regards,

Job

[routing-wg] RPKI: how to migrate an entire industry from RSYNC to RRDP?

2021-04-12 Thread Job Snijders via routing-wg

Hi all,

Some might be wondering what the deal is with RSYNC and RRDP?
Why it is critical to continue to support RSYNC in the mid-term?
What's the industry's plan to migrate from RSYNC to RRDP?

TL;DR - All RIRs need to support both RSYNC and RRDP until at least 2024.
  - All RRDP-capable validators need to make sure they are fully
backwards compatible with RSYNC, until RIRs no longer observe
RSYNC traffic.

Some background:


The development of the RPKI technology stack began more than a decade
ago in the IETF. From the start, RSYNC was the preferred synchronisation
protocol between RPKI publication servers (such as 'rpki.ripe.net') and
Relying Parties (the likes of Telia, NTT, Amazon, me, and you). RSYNC
was picked for a number of reasons: it was available & easy.

A core concept of the RPKI technology stack is that RPKI objects can
be transported via any means: carrier pigeon (RFC 2549 ;-), USB stick,
FTP, RSYNC, ... anything! This is possible because RPKI exclusively
relies on 'object security', the RPKI objects themselves contain all
information that is required to perform X.509-based validation.

As time went by, a second approach was developed to synchronize RPs to
fresh data generated by CAs. It was recognized that where RSYNC servers
need to calculate the 'difference' between the client has and what the
server has (right after the RSYNC client connects), such data could also
be generated a-priori and stored in static files. Pre-generating such a
'journal' of all changes in a repository is considered to be far
efficient than calculating it on the fly. The RRDP protocol has many
appealing properties!

In 2017 the RRDP protocol was published as IETF RFC as 'nice to have'
synchronization protocol for the RPKI. Since then, more and more
Publication Servers operators and RP software implementers worked to
support the new RRDP protocol alongside the old RSYNC protocol.

With two options available, how to migrate?
---

This Gordian Knot has two aspects: all deployed RPKI repository
operators have to support RRDP, and all deployed Relying Party have to
support RRDP. This means that for a succesful transition, for a moment
in time, all stakeholders have to support both RSYNC and RRDP. The
industry has not yet reached that point. I expect this to happen
somewhere in 2023.

At this moment of writing, not all Relying Party software, and not all
RPKI Repository Operators support RRDP. Based on various communications
in the IETF it is clear everyone is working towards implementing RRDP
support. However producing a safe implementation of RRDP is not a
trivial task, it takes time. As RSYNC existed first and RRDP came later,
everyone should be allowed ample time to make the transition.

While everyone waits for everyone to deploy RRDP capable software, the
trick is to PREFER synchronizing via RRDP (and if it fails try RSYNC).

Relevant Internet-Draft: 
https://datatracker.ietf.org/doc/draft-ietf-sidrops-prefer-rrdp/

The concept this is somewhat analogous to 'Happy Eyeballs': for a period
of time many considered it advantageous for global IPv6 deployment to
give IPv6 just a little bit of a head start compared to IPv4. People
knew that purposefully degrading IPv4 would not motivate people to
embrace IPv6.

Also, preferring RRDP (and only using RSYNC in case of RRDP failure),
makes life easier for RPKI Repository operators: it should be possible
for them to temporarily disable the RRDP webserver and expect clients to
use RSYNC instead. Knowing that clients will gracefully fall back to
RSYNC lowers the barrier to deploy RRDP.

Once all RP implementations have embraced the 'prefer RRDP' strategy,
and those implemenations have trickled down into the hands of network
operators and deployed in the field, Repository Operators will observe
less and less clients connecting to the RSYNC service and more and more
syncing via RRDP, to the point where it becomes self-evident publication
via RSYNC can maybe even be decommissioned all together.

TL;DR - general availability of software which prefers RRDP over RSYNC,
combined with patience, should be sufficient of a plan to migrate! :)

Current status
--

Current versions of OpenBSD rpki-client supports RSYNC. The team is
actively working to also support RRDP. The hope is to release a stable
version later this year. OpenBSD supports releases for one year, thus
any deprecation of RSYNC services should be post-poned at least until
Spring 2023 to avoid disenfranchising existing deployments in the field.

The RIPE NCC Validator RRDP implementation is broken. It is trivial for
any RRDP Repository Operator to remotely crash the entire RIPE NCC
validator process. Luckily the software is almost End-Of-Life and soon
won't be relevant anymore.

Current versions of Routinator are unable to fall back to RSYNC. However
in November 2020, the team indicated they would fix this in the next
release (which has

Re: [routing-wg] Issue affecting rsync RPKI repository fetching

2021-04-12 Thread Job Snijders via routing-wg

On Mon, Apr 12, 2021 at 02:12:10PM +0100, Nick Hilliard wrote:
> Erik Bais wrote on 12/04/2021 11:41:
> > This looks to be a 3 line bash script fix on a cronjob …  So why
> > isn’t this just tested on a testbed and updated before the end of
> > the week ?
> 
> cache coherency and transaction guarantees are problems which are
> known to be difficult to solve.  Long term, the RIPE NCC probably
> needs to aim for the degree of transaction read consistency that might
> be provided by an ACID database or something equivalent, and that will
> take time and probably a migration away from a filesystem-based data
> store.
> 
> So the question is what to do in the interim?  The bash script that
> Job posted may help to reduce some of the race conditions that are
> being observed, but it's unlikely to guarantee transaction consistency
> in a deep sense. Does the RIPE NCC have an opinion on whether the
> approach used in this script would help with some of the problems that
> are being seen and if so, would there be any significant negative
> effects associated with implementing it as an intermediate patch-up?

Perhaps the script [0] can be of use, or perhaps not. The script assumes
a POSIXish-compliant environment. It is not clear to me what software
process runs where and how RIPE NCC runs their publication service.

The core problem seems to me that while RSYNC clients are connected the
RIPE NCC RPKI server appears to 'pull the rug' from underneath them.
This practise reduces the reliability of the RIPE NCC RPKI service.

I can only guess how the RIPE NCC RPKI publication service exactly is
configured, but I imagine there is a 'Signer Server' which writes to
disk the few thousand individual RPKI objects, and separately there is a
RSYNC server (rpki.ripe.net) which serves the files to RSYNC clients.
Transferring sets of inter-related files around is a 'batch' operation,
the pipeline should set up accordingly.

As such, calling 'rsync' from crontab to populate the rpki.ripe.net
rsync server would likely lead to inconsistent results.

There are (at least) two objectives to keep in mind:

1/ While the Signer software is writing new files out to disk, the
'signer to publisher' replication process should not run, because the
signer isn't finished yet.

2/ While a given RSYNC client is fetching from 'rpki.ripe.net', the
'signer to publisher' replication process should not alter the contents
of the filesystem hierarchy the RSYNC client is fetching from.

The satisfy the above two conditions, I suspect a number of solutions
are available:

A) take ownership and control and only launch subsequent pipeline steps
when the Signer is done signing the latest requests. After a consistent
set of files has been written to disk, only then copy, stage, and switch
to the new directory contents using a symlink swap (allowing already
connected RSYNC clients to complete their fetch).

B) Use a load balancer to direct new RSYNC clients to a RSYNC server
containing the latest (consistent) set of files.

C) Make the RSYNC service pull from the latest (allegedly consistent)
RRDP snapshot.xml file, then move newly connected clients to the new
content using either the symlink [0] trick or a orchestrate
draining/onramping via a load balancer like haproxy.

There is a wealth of knowledge available in this working group on how
POSIX-like systems work, how ISP operations work, and the RPKI works, I
hope RIPE NCC can leverage that.

Kind regards,

Job

[0]: http://sobornost.net/~job/rpki-rsync-move.sh.txt

Re: [routing-wg] RPKI Invalid == Reject policies on the AS 3333 EBGP border

2021-04-01 Thread Job Snijders via routing-wg

Dear W. Boot,

On Thu, Apr 01, 2021 at 12:38:27PM +0200, W. Boot wrote:
> Would "invalid" also include unsigned space? 

No. By definition, unsigned space can never ever be "RPKI invalid".

In order for any BGP route to be marked as "RPKI invalid", a RPKI ROA
_MUST_ exist. Without covering ROAs, BGP routes cannot be "RPKI invalid".

> If it does, that might lead to legacy space or networks getting space
> through certain NIRs to be accidentally being blocked by whomever
> relying on this, unless these blocks can be exempt from inclusion?

Luckily it doesn't! :-) Operators who use RPKI to perform BGP Route
Origin Validation, do so to to detect & reject invalid routes. As
mentioned above, BGP routes can only be recognized as 'invalid' if and
only if a covering ROA exists.

Complete and simple configuration examples can be found here:
http://bgpfilterguide.nlnog.net/guides/reject_invalids/

By exclusively focussing on "RPKI invalid" BGP routes, RPKI ROV is
incrementally deployable. Incremental deployability is a key factor.

Kind regards,

Job

Re: [routing-wg] Call for Presentations - RIPE 82

2021-03-20 Thread Job Snijders via routing-wg

Hi all,

The expectation is that we can watch material in the way it was intended,
and have the presenter around for live Q and A / discussion. Presenters can
even answer questions while the information is being distributed, which I
find to add a new level of interaction previously not possible!

If a presenter doesn’t want to prerecord (preference for living on the edge
and doing it “live!”) that is fine too. We won’t force anyone to present
live, however we do expect folks to be around to cover the interactive
element (which indeed is important).

I’m not suggesting the meetings are turned into linear television, that
indeed would not be adequate, rather that the formats of NANOG/NLNOG/Django
is followed.

Any presentation proposals on the topic of Routing are welcome, and
whatever delivery format the presenter envisions as the best method to
share their knowledge will be considered and facilitated with what
resources we can muster.

Proposals can be send to routing-wg-cha...@ripe.net

Kind regards,

Job

[routing-wg] Call for Presentations - RIPE 82

2021-03-19 Thread Job Snijders via routing-wg

Dear RIPE Routing WG,

This is a call for presentation proposals for RIPE 82.

The RIPE 82 meeting takes place in about 8 weeks: https://ripe82.ripe.net/

We ask the Working Group and RIPE NCC for presentation proposals for the
illustrious 1 hour Routing WG slot on Thursday, May 20th 2021.

When you submit a proposal please also include slides for the chairs to
review. If you've presented similar material elsewhere please share with
us when and where.

We'll ask presenters to pre-recorded their talk to try to prevent local,
logistcal, and/or routing problems from impacting the meeting. :-)

Kind regards,

Job, Paul, Ignas
Routing Working Group Chairs

Re: [routing-wg] RPKI Route Origin Validation and AS3333

2021-03-18 Thread Job Snijders via routing-wg

Dear RIPE NCC,

On Thu, Mar 18, 2021 at 04:03:16PM +0100, Nathalie Trenaman wrote:
> From the network operations perspective, there are no obstacles to
> enable ROV on AS

Excellent news!

> however, we have to consider that members or End Users who announce
> something different in BGP than their ROA claims, will be dropped and
> lose access to our services from their network. 

Another scenario where a member can't reach RIPE NCC is when the
Member's network is not connected directly or indirectly to RIPE NCC's
network. There are many many scenarios in which this can happen.

Imagine RIPE NCC purchases IP transit from Transit_X, and the member
purchases IP transit from Transit_Y, but Transit_X and Transit_Y for one
reason or another don't peer with each other. In such a network topology
there no exchange of IP traffic is possible between RIPE NCC and the
Member.

The Internet is a 'mostly' connected graph of nodes, the default-free
zone is always in flux. Not everyone can reach everyone all the time.
Sometimes an operator has to walk to the local teahouse or jump on the
wifi network of their neighbor to help fix the connectivity issue.

There never is ANY guarantee all Members or End Users can exchange IP
traffic with RIPE NCC servers at all times. For this specific reason I
applaude the fact that the RIPE NCC 'member sign-up process' can be
executed online and ALSO via postal service. End-to-end Internet
connectivity is not a requirement to do business with RIPE NCC.

> From an analysis we made on 10 February, there were 511 of such
> announcements from our members and End Users.

quick side-note: Did your team check how many of those route
announcements are covered by less-specific 'valid' or 'not-found' route
announcements? or even by a default route? To me or this group the
answer is not that relevant, but I raise this comment to point out that
what matters most in service delivery is the end-to-end data-plane
connectivity, and rejecting a few RPKI invalid routes in and of itself
doesn't necessarily lead to loss of connectivity.

> Our current RPKI Terms and Conditions do not mention that a Member or
> End User ROA should match their routing intentions, or any
> implications it may have if the ROA does not match their BGP
> announcement.

And indeed, the RPKI terms and conditions SHOULD NOT mention anything of
such nature. As Relying Parties we can never know what people actually
intended to publish in the RPKI. All any Relying Party knows is that the
holder of the private keys of a CA with a set of subordinate resources
managed to produce a cryptographicly valid object validating according
the RPKI CP (RFC 6484) and there is a valid chain towards the locally
present Trust Anchor Locator.

It is always laudable to try to stop children from running around with
scissors, but RIPE NCC can't really stop operators from hurting their
network presence. The best RIPE NCC can do is to try to design good User
Interfaces, and provide accurate documentation.

> If the community decides it is important that AS performs ROV, our
> legal team needs to update the RPKI Terms and Conditions to reflect
> the potential impact. 

I challenge the above assertion as I do not believe the legal team has
to update anything.

The RIPE NCC network is connected to the Internet as 'best effort'.

Whether a specific individual IP packets originating from RIPE NCC's
servers arrive at the the final destination or not is not relevant on a
case-by-case basis.

An IP packet might be dropped because of ethernet port congestion, a
routing partitioning gap in the BGP DFZ because of a peering dispute, a
submarine cable cut, a software defect, a member misconfiguring a RPKI
ROA, a local wifi problem, or any other reason...  it doesn't matter.

All we hope for is that when Internet outages occur, someone somewhere
is working on it. :-)

Kind regards,

Job

[routing-wg] Improving operations at RIPE NCC TA (Was: Delay in publishing RPKI objects)

2021-02-17 Thread Job Snijders via routing-wg

Dear RIPE NCC,

On Wed, Feb 17, 2021 at 11:28:32AM +0100, Nathalie Trenaman wrote:
> > The multitude of RPKI service impacting events as a result from
> > maloperation of the RIPE NCC trust anchor are starting to give me
> > cause for concern.
> 
> I’m sorry to hear this. Transparency is key for us, this means that we
> report any event. In this case, we were not compliant with our CPS and
> this non-publishing period had operational impact.

>From the previous email there might be a misunderstanding about what
rpki-client and Routinator do. Both utilities help Relying Parties
validate X.509 signed CMS objects and transform the validated content
into authorizations and attestations. Neither utility is a SLA or CPS
compliance monitor. RIPE NCC - as CA operator - needs different tools.

Neither utility has been designed to interpret the Certification
Practise Policy (written in a natural language) and subsequently
programmatically transform the described 'Service Level' into metrics
suitable for monitoring.

A relying party can never tell the difference between a publication
pipeline being empty because CAs didn't issue new objects, or a
publication pipeline being empty because of a malfunction in one of RIPE
NCC's RPKI subsystems.

More examples of 'out of scope' functionality for Relying Party
software: validators don't monitor whether lirportal.ripe.net is
functional, whether RIPE NCC's BPKI API endpoints are operational, or
whether LIRs paid their invoices, the list is quite long. The validators
by themselves are the wrong tool for RPKI CPS/SLA monitoring.

You state "transparency is key for us", but I fear ad-hoc low-quality
a-posteriori reports are not the appropriate mechanism to impress and
reassure this community regarding 'transparency'.

I have some tangible suggestions to RIPE NCC that will improve the
reliability of the Trust Anchor and potentially help rebuild trust:

A need for Certificate Transparency
---

RIPE NCC should set up a Certificate Transparency project which publicly
shows which certificates (fingerprints) were issued when, and store such
information in immutable logs, accessible to all.

How can anyone trust a Trust Anchor which does not offer transparency
about its issuance process?

Lack of transparency to signer software
---

The RIPE NCC WHOIS database software is open source, as is most of the
software for RIPE Atlas, K-ROOT, and other efforts RIPE NCC has
undertaken over the years.

Why has the signer source code still not open sourced? Why can't members
review the code related to scheduled changes? Why is an organisation
proclaiming 'transparency' being opaque about how the RPKI certificates
are issued?

Lack of Public status dashboard
---

RIPE NCC should set up a website like https://rpki-status.ripe.net/
which shows dashboards with graphs and traffic lights related to each
(best effort) commitment listed in the CPS. RIPE NCC should continuously
publish & revoke & delete objects and verify whether those activities
are visible externally, and then automatically report whether any
potential delays observed are within the Service Levels outlined in the
CPS.

Metrics that come to mind:

* delta between last certificate issuance & successful publication
* Object count in the repository, repo size (and graphs)
* Time-To-Keyroll (and graphs on duration & frequency)
* Resource utilisation of various RPKI subsystems
* aggregate bandwidth consumption for RPKI endpoints (including rrdp, API, 
rsync)
* Graphs & logs of overlap between INRs listed on EE certificates under
  the RIPE TA and other commonly used TAs, matched against known
  transfers. This will help detect compromises as well as understand
  whether transfers are successful or not.
* Unique client IP count for RSYNC & RRDP for last hour/day/week
* Number of CS/hostmaster tickets mentioning RPKI

There is are plenty of aspects to monitor, perhaps some notes should be
copied from how the DNS root is monitored.

Lack of operational experience with BGP ROV at RIPE NCC
---

I believe the number of potential future incidents related to the RIPE
NCC Trust Anchor can be prevented (or remediation time reduced) if RIPE
NCC themselves apply RPKI based BGP Origin Validation 'invalid ==
reject' policies on the AS  EBGP border routers. RIPE NCC OPS
themselves having a dependency on the RPKI services will increase
organization-wide exposure to the (lack of) well-being of the Trust
Anchor services, and given the short communication channels between the
OPS team and the RPKI team my expectation is that we'll see problems
being solved faster and perhaps even problems being prevented.

An analogy: RIPE NCC is a kitchenchef refusing to eat their own food.
How can we trust RIPE NCC to operate RPKI services, when RIPE NCC
themselves refuses to apply the cryptographic products to their BGP

Re: [routing-wg] Delay in publishing RPKI objects

2021-02-16 Thread Job Snijders via routing-wg

Dear RIPE NCC,

On Tue, Feb 16, 2021 at 04:56:31PM +0100, Nathalie Trenaman wrote:
> On Monday, 15 February we encountered an issue with our RPKI software.
> This issue prevented us from publishing RPKI object updates from
> 08:07-18:06 (UTC). 
> 
> During this period, Certificate Authority activation and Route Origin
> Authorization configuration updates were delayed and therefore not
> visible in the RPKI repository.

It appears Certificate Authority revocation was also delayed.

> The updates were published after we restarted the system at 17:45
> (UTC), with full recovery completed by 18:06 (UTC).  Since this
> non-publishing period is shorter than our default RPKI object validity
> period, set to 8 hours, existing objects that are not updated were
> still valid. No data was lost during this period. 

Can the following phrase "default RPKI object validty period, set to 8
hours" please be clarified?

For objects produced in the RIPE-hosted RPKI environment I observe the
following validity periods are commonly used:

Object type| validity duration after issuance
---+-
CRLs   | 24 hours
ROA EE certs   | 18 months
Manifest eContent  | 24 hours
Manifest EE certs  |  7 days
CAs| 18 months

I'm just guessing, is the '8 hour' period a reference to RIPE-751
section 2.3?

"A certificate will be published within eight hours of being issued (or 
deleted)."

The RIPE-751 CPS also states in section 4.9.8 ("Maximum latency for
CRLs"): CRLs will be published to the repository system within one hour
of their generation. 

As the outage appears to have exceeded both the 1 hour revocation window
and 8 hour object publication window, RIPE NCC was not compliant with
its own CPS.

The multitude of RPKI service impacting events as a result from
maloperation of the RIPE NCC trust anchor are starting to give me me
cause for concern.

Kind regards,

Job

52 matches

Mail list logo