Re: [routing-wg] RIPE NCC RPKI Routing Update May 2024
Dear RIPE NCC RPKI team, < Speaking with no hats > Thank you for sharing current status, plans, and reasoning behind prioritization choices. I'm excited to see RIPE NCC's 'hosted' RPKI service offering starting to move beyond "just ROAs". Kind regards, Job On Mon, May 20, 2024 at 10:35:51AM +0200, Tim Bruijnzeels wrote: > Dear colleagues, > > Due to a full agenda, there will be no RIPE NCC RPKI update during the > Routing Working Group session at RIPE 88. Instead, I will give a short > high-level update during the RIPE NCC Services Working Group session. I would > also like to share a more detailed update on our work and changes since RIPE > 87 and our plans for the coming months. > > If you have any questions or comments, or if you want to express your ideas > on priorities, then please don't hesitate to talk to me in person at the RIPE > Meeting, join the RIPE NCC Services Working Group session or discuss on this > mailing list. > > RPKI Compliance Project (ISAE3000) > = > > The RIPE NCC is currently conducting an ISAE3000/SOC 2 audit for the RPKI > service. The SOC 2 type I audit is in its final stages. We are planning to > continue to work on a (recurring) SOC 2 type II audit in the years to come. > If you want to know more about this project, then I recommend you watch the > Technology Update that Felipe Victoria Silveira will give during the RIPE NCC > Services WG session at RIPE 88. > > Supporting the audit has taken significant resources from the development > team, but on the positive side, this has forced us to critically think about > all aspects of the RPKI service: software, infrastructure, processes and the > supporting organisation. As a result, we have made small but significant > improvements such as providing more human-friendly insight into the Trust > Anchor signing process, improving the database point-in-time recovery options > (using write-ahead-logs), formally capturing existing engineering knowledge > in a business continuity plan, and updating the Certification Practice > Statement for transparency. > > ASPA Support in the Pilot > > > ASPA has been supported in the RIPE NCC RPKI Pilot (‘localcert’) environment > since November 2022. We updated the implementation to use the latest ASPA > profile in January 2024. More information on using the API can be found in > this email sent to the IETF Sidrops mailing list: > https://mailarchive.ietf.org/arch/msg/sidrops/K_d8S0ZDXnK0-vXD33uyHc6RnkE/ > > For those unfamiliar with ASPA, the very short summary of it is that it > allows AS holders to declare who their provider ASNs are - in a BGP path > sense, not necessarily in a business relation sense. This can be used to > detect route leaks and to some degree (mainly dependent on uptake) path > spoofing attacks against ROV. > > For a longer explanation, I would like to refer you to my talk at SEE 12, > where I try to explain ASPA using examples [1]. For a longer, and more > precise talk, I can recommend a presentation given by Ben Maddison at AfPIF > 2023 [2]. For a more fundamental understanding, you can read up on the formal > specification of the verification [3] and proof of correctness [4]. Of > course, there is more information available. So, if anyone has other useful > pointers here, please don’t hesitate to mention them. > > RPKI Dashboard Improvements > > > As mentioned in the quarterly planning, we have been working on the RPKI > dashboard to improve its usability and make it possible to extend its > functionality with new object types. We have performed a user study of the > existing dashboard and have started the implementation of the new dashboard. > > We have been making good progress on this project and we expect to be able to > start public beta testing in about six weeks from now. The primary goal of > this beta testing is to ensure that the new dashboard works intuitively for > the users of the hosted RPKI service, before switching over to the current > dashboard. Of course, we also do our own testing, but input from real users > is invaluable. If you are interested in participating in beta testing, then > please let me know and I will make sure that we get in touch with you. > > Future Work > == > > Below, I will give an overview of several work items that we can pick up > after the current work on the RPKI dashboard is done. We have ideas about > what priorities to give to each item, but I would like to take this > opportunity to ask the members of this WG to speak up and share what > priorities they would give to each item. > > - ASPA > > Unfortunately, the discussion on the profile and RPKI to router payload has > not yet been completely settled in the IETF Sidrops WG. That said, there is > overwhelming support in the same WG for getting ASPA ready for deployment. > Furthermore, early adopters are testing ASPA
[routing-wg] RPKI ROA deployment now at 50%
Dear all, Fun news! RPKI ROAs now cover 50% of the global Internet’s routing table. We estimate 70% of traffic is send towards ROV-valid destinations. An analysis on this milestone and propagation of invalid routes: https://www.kentik.com/blog/rpki-rov-deployment-reaches-major-milestone/ Kind regards, Job -- To unsubscribe from this mailing list, get a password reminder, or change your subscription options, please visit: https://lists.ripe.net/mailman/listinfo/routing-wg
[routing-wg] Call for presentation Routing WG at RIPE86 (Rotterdam, Netherlands)
Dear RIPE Routing WG, This is a call for presentation proposals for RIPE 86! The RIPE 86 meeting takes place in about 66 days: https://ripe86.ripe.net/ We ask both the Working Group and RIPE NCC for presentation proposals for the eminent 1.5 hour Routing WG slot on Wednesday, May 24th, 2023. The Routing Working Group is concerned with all aspects of IP routing technologies. This includes dissemination and discussion of issues affecting operators, new technologies and new applications of current technologies, and discussion of concerns relevant to inter and intra-AS routing. A non-exhaustive list of topics of interest includes BGP, Routing Security, RPKI (ROV, ASPA, BGPsec), LSR (OSPF/ISIS), Anycast, and BGP Monitoring (MRT, BMP). When you submit a proposal please also include slides for the chairs to review. If you've presented similar material elsewhere please share with us when and where. Proposals can be sent to routing-wg-cha...@ripe.net We look forward to seeing you all in Rotterdam! Kind regards, Job, Ignas, Paul RIPE Routing Working Group Co-Chairs -- To unsubscribe from this mailing list, get a password reminder, or change your subscription options, please visit: https://lists.ripe.net/mailman/listinfo/routing-wg
[routing-wg] RPKI's 2022 Year in Review: growth & innovation
Dear all, With 2023 at our doorstep, I'd like to share some perspective on how RPKI evolved in the year 2022. Impact on the Global Internet Routing System Decision makers might wonder: is investing time and resources worth it? What is the effectiveness of RPKI Route Origin Validation (RPKI-ROV)? In the last year a number of interesting reports were published. Even though less than half of BGP routes is covered by RPKI ROAs [6], based on flow data, Kentik estimates [2] nowadays the majority of IP traffic is destined towards RPKI-valid BGP routes. Their follow-up report [3] (analysing BGP control-plane data) suggests that evaluation of a BGP route as RPKI-invalid reduces its propagation by anywhere between one half to two thirds. Cloudflare [4] published a report analysing data-plane connectivity between a select number of ASes and RPKI-invalid destinations: they estimate 6.5% (lower-bound) of residential Internet users enjoy the benefits their ISP doing RPKI-ROV. Another experiment report [5] (focussed on data-plane connectivity between validators and RPKI-valid/RPKI-invalid destinations), concluded the existence of RPKI ROAs helped move 75% of test traffic towards the correct destination. The above metrics might appear all over the place (6.5% up to 75%), but keep in mind these analyses are not mutually exclusive. Observations of the Internet's topology are a function of the observer's vantage point. All the referenced reports agree on key points: * ROAs have a measurable & significant impact on global IP traffic delivery * RPKI-ROV helps reduce the "blast radius" of BGP routing incidents * They recommend to continue the global deployment of RPKI-ROV (rejecting RPKI-invalid BGP routes), and create ROAs for all IP address space. Year to Year Growth of the distributed RPKI database In comparison to "effectiveness", the bare existence, size, contents, and number of Signed Objects in the globally distributed RPKI repository system is much easier to quantify. The below table was constructed by comparing two December 31st RPKIviews.org snapshots [1] of validated RPKI caches, primed with the ARIN, AFRINIC, APNIC, LACNIC, and RIPE Trust Anchors. 2021-12-31 2022-12-31 Total cache size (KiB): 996,216 1,240,572 (+24%) Total number of files (objects): 192,503242,969 (+26%) Publication servers (FQDNs): 36 52 (+44%) Certification authorities: 28,328 34,901 (+23%) Route origin authorizations: 101,645138,323 (+36%) Unique VRPs: 302,025390,752 (+29%) IPv4 addresses covered: 1,139,561,719 1,354,270,410 (+19%) IPv6 addresses covered: 7,499,405,083 9,446,853,925 (+26%) *10^24 Unique origin ASNs in ROAs:27,174 34,455 (+27%) A healthy growth rate across the board! With the ubiquitous availability of "Publication as a Service" hosted by RIRs, I expect (and hope!) the growth of the number of distinct publication servers to stall, or even drop in 2023. The number of Certification Authorities (CAs) closely corresponds to the number of RIR members (RIR customers) who opted to enable RPKI services for their Internet Number Resources, making it a useful proxy metric to understand how many organisations are creating RPKI ROAs. A single Route origin authorizations (ROA) can contain one or more Validated ROA Payloads (VRPs), and one or multiple ROAs can contain the exact same VRP information. "Unique" in the above table indicates the metric's underlaying data was deduplicated. Each ROA can only contain a single Origin ASN. Multiple ROAs can refer to the same Origin ASN value. Innovation through Standardisation == The IETF SIDROPS [7] working group (the designated forum in which volunteers collaborate to define and specify open standards for RPKI and RPKI-based technologies) was fairly productive in 2022 and managed to publish 5 RFCs: RFC 9286 - Manifests for the RPKI (revision) RFC 9255 - The 'I' in RPKI Does Not Stand for Identity (clarification) RFC 9319 - The Use of maxLength in the RPKI(clarification) RFC 9323 - A Profile for RPKI Signed Checklists (RSCs)(innovation) RFC 9324 - Policy Based on the RPKI without Route Refresh (innovation) The above body of work consists mostly of revisions of older work or clarifications on how to use the RPKI, to me this demonstrates a somewhat conservative approach (rather than innovation at breakneck speed), which I consider a good thing. Outlook & Conclusion Now that globally Route Origin Validation has advanced as far as it has, the next obvious target is BGP path validation, to mitigate two distinct problems: BGP route leaks and BGP AS_PATH spoofing. Both painful to network
Re: [routing-wg] Proposed Service Criticality Form
Dear RIPE NCC, Thanks for offering the opportunity share feedback. I'd like to comment in individual capacity on the porposal Service Criticality ratings. Summary: I consider the proposed criticality ratings appropriate for the RPKI service. Elaboration: 1/ Confidentiality Comment: The purpose of RPKI keys is signing and signature chain verification, a purpose very different from encryption (encryption being a prerequisite for to achieve confidentiality). 2/ Integrity Comment: Unauthorized access to private key materials or to interfaces which trigger operations with private key material (such as the LIR portal) poses a high risk to operators. It is easy to imagine a multitude of scenarios where a compromise of integrity leads to high severity incidents. 3/ Availability Comment: To ability to issue, revoke, and verify RPKI certificates is a 24/7 operational aspect where any downtime stalls business processes. For example, being unable to rectify a misconfigured ROA can have very adverse impact on business. A high severity rating seems proper. I'm happy to elaborate on any of the above comments if the comments raise questions in people! Kind regards, Job On Thu, Dec 15, 2022 at 04:17:48PM +0100, Nathalie Trenaman wrote: > Dear Colleagues, > > During RIPE 84 in May, we discussed the Service Criticality Framework. We > mentioned that we would like the Routing Working Group's input on the > criticality rating for the RPKI service. > > Thank you to the Co-chairs for asking the working group for input in June > with the original form: > https://www.ripe.net/ripe/mail/archives/routing-wg/2022-June/004582.html > > Since we did not receive much feedback so far, we hereby share our proposed > completed form with you (see attached PDF). We hope to receive your feedback > by 23 December. > > Kind regards > Nathalie Trenaman > RIPE NCC > > -- > > To unsubscribe from this mailing list, get a password reminder, or change > your subscription options, please visit: > https://lists.ripe.net/mailman/listinfo/routing-wg -- To unsubscribe from this mailing list, get a password reminder, or change your subscription options, please visit: https://lists.ripe.net/mailman/listinfo/routing-wg
Re: [routing-wg] Proposed Service Criticality Form
Dear all, [[ ... If you are looking for a fun way to spend some time on your second-last Friday of the year ... please read on! ... :-) ... ]] The working group is encouraged to consider commenting on the Service Criticality Framework proposal for RPKI. Understanding the community's wishes and expectations around RPKI Service Delivery will help greatly help RIPE NCC improve RPKI services. Kind regards, Job Routing-WG Co-chair On Thu, Dec 15, 2022 at 04:17:48PM +0100, Nathalie Trenaman wrote: > Dear Colleagues, > > During RIPE 84 in May, we discussed the Service Criticality Framework. We > mentioned that we would like the Routing Working Group's input on the > criticality rating for the RPKI service. > > Thank you to the Co-chairs for asking the working group for input in June > with the original form: > https://www.ripe.net/ripe/mail/archives/routing-wg/2022-June/004582.html > > Since we did not receive much feedback so far, we hereby share our proposed > completed form with you (see attached PDF). We hope to receive your feedback > by 23 December. > > Kind regards > Nathalie Trenaman > RIPE NCC -- To unsubscribe from this mailing list, get a password reminder, or change your subscription options, please visit: https://lists.ripe.net/mailman/listinfo/routing-wg
Re: [routing-wg] RPKI ROAs and Monitoring
Hi Klaus! On Mon, Dec 12, 2022 at 12:12:03PM +0100, Klaus Darilion via routing-wg wrote: > Until now we have not used RPKI. For us at nic.at and RcodeZero DNS we > are not on the validating side of RPKI, but we would only create ROAs, > using the RIPE service. I could just login to the RIPE portal and in 5 > minutes it is done. But I am a bit concerned about activating the > service and do not care anymore. Hence I think we should have some > monitoring too. Monitoring your ROAs is a really good idea! I recommend taking a look at this presentation https://www.youtube.com/watch?v=cJUkOu9nWT8 > We have a defined target state, eg. prefix 83.136.32.0/21 should be > announced from AS30971. So I think our monitoring should check: > > - is there a ROA for 83.136.32.0/21 from AS30971 > - is the ROA valid, ie. not expired > - Will validating ISPs accept these prefixes? Will validating > ISPs reject this prefix if the orign AS is wrong (maybe having a local > Routinator or queriying a public service via API). Indeed, validating ISPs will reject the BGP announcement if the Origin AS is incorrectly configured in the ROA. Make sure to not make any typos when creating ROAs! :-) Here is a blog post that details what the impact is of misconfigured ROAs (and conversely - what the positive impact is of correctly configured ROAs!) https://www.kentik.com/blog/how-much-does-rpki-rov-reduce-the-propagation-of-invalid-routes/ > Do you think this makes sense? Is such monitoring already available > and I only have to subcribe somewhere (free or comemrcial)? Do I miss > something? Any hints what I should do before and after creating the > ROAs? One dataset to check for RPKI objects related to your prefixes is https://console.rpki-client.org/dump.json.gz (for all details) or https://console.rpki-client.org/vrps.json (for condensed version) > PS: What happens if my ROAs expire. Will then my BGP announcements be > ignored by validating ISPs or will it just be as if there are no ROAs > at all? Indeed, then it will be like there are no ROAs at all. Kind regards, Job -- To unsubscribe from this mailing list, get a password reminder, or change your subscription options, please visit: https://lists.ripe.net/mailman/listinfo/routing-wg
Re: [routing-wg] Call for presentation Routing WG at RIPE85 (Belgrade, Serbia)
Dear RIPE Routing WG, This is a repeat of the call for presentation proposals for RIPE 85. The RIPE 85 meeting takes place in about 32 days: https://ripe85.ripe.net/ We ask the Working Group and RIPE NCC for presentation proposals for the illustrious 1.5 hour Routing WG slot on Wednesday, October 26th, 2022. When you submit a proposal, please also include slides for the chairs to review! :-) Proposals can be sent to routing-wg-cha...@ripe.net Kind regards, Job, Ignas, Paul Routing Working Group Chairs -- To unsubscribe from this mailing list, get a password reminder, or change your subscription options, please visit: https://lists.ripe.net/mailman/listinfo/routing-wg
[routing-wg] Call for presentation Routing WG at RIPE85 (Belgrade, Serbia)
Dear RIPE Routing WG, This is a call for presentation proposals for RIPE 85. The RIPE 85 meeting takes place in about 64 days: https://ripe85.ripe.net/ We ask the Working Group and RIPE NCC for presentation proposals for the illustrious 1.5 hour Routing WG slot on Wednesday, October 26th, 2022. When you submit a proposal please also include slides for the chairs to review. If you've presented similar material elsewhere please share with us when and where. Proposals can be sent to routing-wg-cha...@ripe.net Kind regards, Job, Ignas, Paul (your humble Routing Working Group Chairs) -- To unsubscribe from this mailing list, get a password reminder, or change your subscription options, please visit: https://lists.ripe.net/mailman/listinfo/routing-wg
[routing-wg] Frequently Asked Questions about 2000::/12 and related routing errors
Dear all, Last night many people received "Resource Certification (RPKI)" alerts, which in turn caused my phone to light up with questions! :-) In the below message I'll attempt to provide an analysis of what happend and answer frequently asked questions. * What happened? * Has this happened before? * Why didn't RPKI Route Origin Validation (ROV) stop this? What happened? == As reported in the media (https://twitter.com/DougMadory/status/1544862409336184832) one Internet Service Provider announced to the world - through the BGP protocol - that all Internet Protocol addresses contained within 2000::/12 were reachable via them. This was a routing error, an error condition which triggered various monitoring systems around the globe. Background: The BGP Default-Free Zone is composed of ~ 150,000 IPv6 networks originated from ~ 24,000 Autonomous Systems (ASes). The totality of this is what forms the IPv6 Internet. The majority of these networks have a prefix length in the range of /32 up to /48. Currently the world's largest IPv6 assignments (of which there are very few) are clocking in at /19. So, a /12 ("slash twelve") BGP announcement covers an exceptionally large number of IP addresses! This night's /12 BGP announcement covered such a large block of address space, it happened to overlap with about 21,292 existing networks originated by 3,697 ASes. For roughly 69% (14,695) of those networks RPKI ROAs had been created. About 10% (2,176) of those "RPKI ROA covered existing networks" is IPv6 space managed under the RIPE NCC umbrella. I imagine a few hundred operators received alerts from RIPE NCC with a suggestion to considering creating corresponding ROAs to make the 2000::/12 announcement valid; however no ISP can create such a ROA, because no single ISP is authoritative for the entirety of that block. :) Has this happened before? = Yes. This type of routing error happens almost annually. Some time ago Tom Strickx reported an incident involving 2400::/12, a block which nowadays overlaps with more than 40,000 networks! (source: https://twitter.com/Jerome_UZ/status/1145136294835523584) If my memory serves me right, back in 2016 AS 1299 originated both 2000::/6 and 2000::/12, later that year AS 10026 also originated 2000::/12 for a bit. So... how exactly can this happen? I believe it is a mixture of user-interfaces with really sharp edges and permissive EBGP filters. Many router-to-router linknets are assigned a /127 [RFC 6164] or a /64 [RFC 7421], and loopback addresses generally are assigned a /128 (a single address). It's not hard to imagine that when copy+pasting or typing by hand, an operator fails to input the last digit (respectively a 7 in the case of /127, the 4 in /64, or the 8 in /128), resulting in a configuration with a /12 or a /6 as the prefix length. See these Cisco & Juniper terminal transcript examples for a demonstration of failing to correctly enter the last digit of "2001:67c:208c::/128" : https://chloe.sobornost.net/~job/slash-twelve.txt Why didn't RPKI ROV stop this? == Creating RPKI ROAs and performing Route Origin Validation (ROV) on received BGP route announcements helps protect against mishaps with unauthorized "same-length" and "more-specific" announcements. ROV (by design) does nothing against unauthorized "larger overlapping" route announcements (such as 2000::/12). This is because the Internet's global routing system is based on the Longest Prefix Match (LPM) algorithm (see https://en.wikipedia.org/wiki/Longest_prefix_match) LPM means that as long as your certified address space is in the global routing table, a less-specific announcement (such as 2000::/12) is not very likely to draw IP traffic away from your network. In incidents like these the major impact seems to be that monitoring systems are triggered (which is appropriate!). I suspect there is virtually no impact to business operations (fortunately!). Questions welcome! Kind regards, Job -- To unsubscribe from this mailing list, get a password reminder, or change your subscription options, please visit: https://lists.ripe.net/mailman/listinfo/routing-wg
[routing-wg] RPKI Service Criticality Questionnaire
Dear all, RIPE NCC has asked the Routing WG Chairs to facilitate a working group conversation on framing RIPE NCC's RPKI services subcomponents in terms of criticality. At the bottom of this email is a form that focusses on three components: confidentiality, integrity and availability. Each component is split into three questions (a, b, and c), a total of 9 questions are being put forward to the working group. We envision this process to be a public consultation: WG participants can submit (free-form) responses, and also chime in by replying to each other's responses; hopefully bringing us to a degree of consensus in the coming weeks. I believe this is an unique opportunity to help RIPE NCC! Investing our time - in turn - will help ourselves rely on and integrate RIPE NCC's RPKI services in our production environments. The goal is to help RIPE NCC develop a deeper understanding of how the moving parts fit together, which in turn helps decide where and how to invest resources. >>> Your feedback is much appreciated! <<< NOTE: if you are *NOT* a RIPE NCC member, and use the RIPE NCC Trust Anchor (e.g. as Relying Party to make informed routing decisions, inside and outside the RIPE region), your feedback *also* is much appreciated. Kind regards, Job, Ignas, Paul Routing WG co-chairs --- FORM STARTS BELOW --- Service Criticality Questionnaire Form - RPKI = Introduction This form is used to gather input from the community on the service criticality of the RPKI Service from RIPE NCC. The framework is detailed in: https://labs.ripe.net/author/razvano/service-criticality-framework/ The service criticality has three components: * Confidentiality: What is the highest possible impact of a data confidentiality-related incident (e.g. data leak)? * Integrity: What is the highest possible impact of a data integrity-related incident (e.g. hacking)? * Availability:What is the highest possible impact of a service availability-related incident (e.g. outage)? (All RIPE NCC services are designed with at least 99% availability, so please consider outages of up to 22 hours.) Service purpose --- The RIPE NCC RPKI Service is the RPKI Trust Anchor (TA) for the RIPE NCC service region, comprised of: * RPKI Dashboard (in the LIR portal) * Repositories (rsync/RRDP) * Certification Authorities (CAs) * RPKI Management API * Hardware Security Modules (HSMs) * Datasets Service Criticality --- Please review the following three areas. ## (1) Global Routing Incident Serverity * Low(No / negligible impact) * Medium (One or a few ASes are unavailable) * High (Many ASes in a region are unavailable) * Very High (Global Internet routing disruptions) Please rate the incident serverity (Low to Very High) in the following three areas. Please explain why. (a) Confidentiality (Impact level of incidents such as data leaks) Answer 1a: (b) Integrity (Impact level of incidents such as hack attempts) Answer 1b: (c) Availability (Impact level of service outage incidents, up to 22 hours per quarter) Answer 1c: ## (2) IP addresses and AS Numbers Incident Serverity * Low (No / negligible impact) * Medium(Local disruptions (registration information not being available for some entities)) * High (Regional disruptions (registration information not being available for the RIPE NCC region)) * Very High (Global disruptions (lack of registration information for all AS Numbers and IP addresses)) Please rate the incident serverity (Low to Very High) in the following three areas. Please explain why. (a) Confidentiality (Impact level of incidents such as data leaks) Answer 2a: (b) Integrity (Impact level of incidents such as hack attempts) Answer 2b: (c) Availability (Impact level of service outage incidents, up to 22 hours per quarter) Answer 2c: ## (3) Global DNS Incident Severity * Low (No / negligible impact) * Medium(Local disruptions) * High (Regional disruptions) * Very High (Global disruptions) Please rate the incident serverity (Low to Very High) in the following three areas. Please explain why. (a) Confidentiality (Impact level of incidents such as data leaks) Answer 3a: (b) Integrity (Impact level of incidents such as hack attempts) Answer 3b: (c) Availability (Impact level of service outage incidents, up to 22 hours per quarter) Answer 3c: FORM ENDS -- -- To unsubscribe from this mailing list, get a password reminder, or change your subscription options, please visit: https://lists.ripe.net/mailman/listinfo/routing-wg
[routing-wg] state of RPKI-invalid objects in IRR databases (2022.05.16)
Dear all, On the #DENOG IRC channel I was asked for current stats on the number of RPKI-invalid IRR route/route6 objects in various databases as follow-up to a talk at RIPE81 [0]. I figured I should share this with the WG too. Below is a table with today's stats of number of invalid route/route6 objects when one applies the RFC 6811 origin validation algorithm with as input prefix value in the "route:" attribute and the origin AS in the "origin:" attribute. invalids invalids AFRINIC: ipv4: 359 - ipv6: 12 - authoritive ALTDB: ipv4: 1 - ipv6:191 - note 4 APNIC: ipv4: 21861 - ipv6: 1880 - authoritive ARIN: ipv4: 814 - ipv6: 65 - authoritive BBOI: ipv4: 44 - ipv6: 1 BELL: ipv4: 322 - ipv6: 0 JPIRR: ipv4: 95 - ipv6: 4 LACNIC:ipv4: 0 - ipv6: 0 - authoritive (note 3) LEVEL3:ipv4: 12925 - ipv6:182 NTTCOM:ipv4: 65513 - ipv6:730 RADB: ipv4: 208901 - ipv6: 12829 RGNET: ipv4: 2 - ipv6: 0 RIPE: ipv4: 28390 - ipv6: 3518 - authoritive RIPE-NONAUTH: ipv4: 5 - ipv6: 0 - note 5 TC:ipv4: 0 - ipv6: 0 - note 2 Some notes on the above table: 1) ARIN-NONAUTH is not listed, ARIN deprecated this IRR source a month ago [2]. 2) TC achieved a perfect 0/0 score by using the IRRd v4 RPKI integration [3]. 3) LACNIC's IRR service is an information proxy for RPKI ROAs valid under the LACNIC Trust Anchor. This by definition means that all IRR objects in the LACNIC IRR database are RPKI-valid. 4) ALTDB periodically runs a script to delete RPKI-invalid objects 5) RIPE-NONAUTH imposes a two week delay before deleting RPKI-invalid objects, so the 5 IPv4 objects currently marked as invalid with disappear in the next few days, unless the covering RPKI ROAs are withdrawn before the timer expires. The stats are generated by downloading the IRR database dump for each source and running a simple python script [1]. Kind regards, Job [0]: https://ripe81.ripe.net/presentations/59-IRRd-RIPE812.pdf [1]: https://github.com/job/irr-nonauth-cleanup [2]: https://www.arin.net/announcements/20220128-irr/ [3]: https://irrd.readthedocs.io/en/stable/admins/rpki/ -- To unsubscribe from this mailing list, get a password reminder, or change your subscription options, please visit: https://lists.ripe.net/mailman/listinfo/routing-wg
[routing-wg] follow-up on RPKI open house: are CRLs used?
Hi, A question was raised during today's RPKI Data Open House: "are CRLs used?" Just now I ran some statistics on today's RPKI state using the 5 RIR TALs: * There are 30,914 CRL files. * In totality, these 30K CRLs list revocations for 331,637 serials. * 5,369 CRL files don't list revoked serials (at this point in time). * On average, the non-empty CRLs list 12 revoked certs. Further analysis can be performed using this JSON representation of the global RPKI state: https://console.rpki-client.org/dump.json (174 megabyte) https://console.rpki-client.org/dump.json.gz (38 megabyte) The dump.json file is regenerated a few times a day. Kind regards, Job -- To unsubscribe from this mailing list, get a password reminder, or change your subscription options, please visit: https://lists.ripe.net/mailman/listinfo/routing-wg
[routing-wg] Fw: 2749 routes AT RISK - Re: TIMELY/IMPORTANT - Approximately 40 hours until potentially significant routing changes (re: Retirement of ARIN Non-Authenticated IRR scheduled for 4 April 2
Hi all! Sharing here as FYI. The impact of this event is very hard to understand. Aspect 1/ BGP routes might be impacted depending on *when* IRR mirror operators remove the ARIN-NONAUTH from their list of sources (as far as I understand ARIN will disable FTP/NRTM access). Commonly used mirror operators are RADB, NTT, Lumen, and of course the local IRRd instances operators might be running to speed up prefix-filter generation. The 'peval' and 'bgpq3' utilities by default point at whois.radb.net, the 'bgpq4' utility by default points at rr.ntt.net. Aspect 2/ Overlapping (less-specific) objects might exist in other databases, masking this event to some degree. All timers in this IRR supply chain are unknown variables. Below is a list of prefixes which *might* be affected, my hope is that this list assists in troubleshooting efforts in the coming days / weeks. Kind regards, Job - Forwarded message from Job Snijders - Date: Mon, 4 Apr 2022 17:56:45 +0200 From: Job Snijders To: na...@nanog.org Subject: 2749 routes AT RISK - Re: TIMELY/IMPORTANT - Approximately 40 hours until potentially significant routing changes (re: Retirement of ARIN Non-Authenticated IRR scheduled for 4 April 2022) Dear all, On Sat, Apr 02, 2022 at 09:09:58PM +, John Curran wrote: > As previously reported here, ARIN will be shutting down the > ARIN-NONAUTH IRR database on Monday, 4 April 2022 at 12:00 PM ET. > > It is quite likely that some network operators will see different > route processing as a result of this change, as validation against the > ARIN-NONAUTH IRR database will not longer be possible. > > Please be aware of this upcoming event and make alternative > arrangements if you are presently relying on upon routing objects in > the ARIN-NONAUTH IRR database. I ran an analysis just now in which I created an intersection between prefixes observed in the BGP default-free zone and exactly matching route:/route6: objects in ARIN-NONAUTH. I then substracted exact matching objects found in the RADB, ALTDB, TC, NTTCOM, LEVEL3, RIPE, and APNIC IRR sources. The result is a list of routes which might experience service disruptions due to missing IRR objects. The below 2,749 Prefix + Origin AS pairings are at risk as a result of ARIN shutting down the ARIN-NONAUTH IRR database. Any potential effects are likely to manifest themselves in the coming 24 - 32 hours. Prior to this announcement, ARIN consulted with its community on the future of its IRR service. I voiced objection and raised concerns about (what appeared to be) limited visibility into what exactly the effects of such a database shutdown would mean for the Internet: https://lists.arin.net/pipermail/arin-consult/2021-March/001237.html Other community members also shared concerns: https://lists.arin.net/pipermail/arin-consult/2021-February/001195.html A number of graceful alternative mechanisms were proposed, but not acted upon: https://lists.arin.net/pipermail/arin-consult/2021-March/001240.html One might argue "well, folks had more than a year to move their objects!", but on the other hand, it is entirely possible not all the right people were reached, or in cases where affected parties did receive a communication from ARIN, they perhaps were unable to understand the message. Please check if any of your prefixes are on the below list, and if so, double check whether your IRR administration is able to overcome the disappearance of ARIN-NONAUTH. Godspeed! This tool might be useful: https://irrexplorer.nlnog.net/ Kind regards, Job Prefix OriginAS 100.42.100.0/24 33353 100.42.101.0/24 33353 100.42.102.0/24 33353 100.42.104.0/24 33353 100.42.105.0/24 33353 100.42.106.0/23 33353 100.42.108.0/24 33353 100.42.109.0/24 33353 100.42.96.0/23 33353 100.42.98.0/24 33353 103.11.202.0/24 33517 103.13.12.0/24 38057 103.13.135.0/24 51830 103.15.168.0/23 55532 103.196.22.0/23 7489 103.219.78.0/24 55256 103.219.79.0/24 55256 103.232.224.0/24 125 103.250.176.0/24 134795 103.250.177.0/24 134795 103.250.178.0/24 134795 103.250.179.0/24 134795 103.35.217.0/24 125 103.47.244.0/24 55256 103.88.172.0/24 136271 103.88.173.0/24 136271 103.88.174.0/24 136271 104.128.96.0/20 19233 104.142.128.0/23 33353 104.142.130.0/23 33353 104.142.136.0/22 33353 104.142.140.0/23 33353 104.142.144.0/24 33353 104.142.145.0/24 33353 104.142.146.0/24 33353 104.142.147.0/24 33353 104.142.148.0/24 33353 104.142.149.0/24 33353 104.142.152.0/24 33353 104.142.153.0/24 33353 104.142.156.0/24 33353 104.142.160.0/24 33353 104.142.164.0/24 33353 104.142.165.0/24 33353 104.142.175.0/24 33353 104.142.176.0/24 33353 104.142.177.0/24 33353 104.142.180.0/24 33353 104.142.181.0/24 33353 104.142.184.0/24 33353 104.142.185.0/24 33353 104.142.186.0/24 33353 104.142.187.0/24 33353 104.142.188.0/24 33353 104.142.189.0/24 33353 104.142.190.0/24 33353 104.142.191.0/24 33353 104.142.192.0/24 33353 104.142.224.0/24 33353 104.142.248.0/21 33353 104.142.249.0/24 33353 104.142.251.0/24 33353
Re: [routing-wg] RPKI vulnerable?
Hi all, It might be the case that the vulnerability is in the realm of disagreement with some design choices of the past, rather than a traditional CVE hole in one or more software packages. I found the following paper which touches upon the “assumed trust” aspect of RPKI in the relationship between Relaying Party and Trust Anchor(s). https://www.researchgate.net/publication/349045074_Privacy_Preserving_and_Resilient_RPKI I’m very interested in discussion about cross-signing schemes. Kind regards, Job -- To unsubscribe from this mailing list, get a password reminder, or change your subscription options, please visit: https://lists.ripe.net/mailman/listinfo/routing-wg
Re: [routing-wg] RFO for RIPE NCC RPKI outage 16 February 2022
On Wed, 16 Feb 2022 at 19:49, Rob Austein wrote: > On Wed, 16 Feb 2022 13:10:27 -0500, Job Snijders wrote: > > On Wed, 16 Feb 2022 at 19:07, Randy Bush wrote: > > > > sra commented to me that, an rp doing protocol fall-over from rrdp to > > rsync, or vice versa, has to do the full download as the data structure > > is so different. i.e. load spike > > > > Perhaps it doesn’t need to be a full load: “rsync ―compare-dest” > > (against a previously downloaded and validated set of signed > > objects) offers a path towards optimising the protocol fall-over. > > Even assuming the RRDP client stores and believes the rsync URIs in > the RRDP data stream, and further assuming that the client is clever > enough to write out its RRDP-derived database into a directory tree > which exactly matches an rsync filesystem layout before failing over, The OpenBSD RPKI validator does the above, while maintaining robust cryptographic integrity (in version 7.6 and higher). I hope other validators take inspiration from this, similar to how we (OpenBSD) took inspiration from the Dragon Labs implementation. Your work lives on and on, hat tip to you Rob! :-) RRDP doesn't convey things like file modification dates that rsync > needs to perform an efficient incremental transfer, so the first rsync > pass is still going to be expensive. > > Not obvious to me that there's any good way to optimize this. YMMV. > Ties once pointed me at the GPL rsync “-c” (checksum) option, which makes transfers more focussed on content rather than filesystem attributes. From my (openrsync) this is still work to be done. I see a path :-) Kind regards, Job -- To unsubscribe from this mailing list, get a password reminder, or change your subscription options, please visit: https://lists.ripe.net/mailman/listinfo/routing-wg
Re: [routing-wg] RFO for RIPE NCC RPKI outage 16 February 2022
On Wed, 16 Feb 2022 at 19:07, Randy Bush wrote: > thanks for the post mortem, ties. > > sra commented to me that, an rp doing protocol fall-over from rrdp to > rsync, or vice versa, has to do the full download as the data structure > is so different. i.e. load spike Perhaps it doesn’t need to be a full load: “rsync —compare-dest” (against a previously downloaded and validated set of signed objects) offers a path towards optimising the protocol fall-over. Kind regards, Job > -- To unsubscribe from this mailing list, get a password reminder, or change your subscription options, please visit: https://lists.ripe.net/mailman/listinfo/routing-wg
Re: [routing-wg] rsync://rpki.ripe.net rsyncd limits set too low?
Hi Ties, Thank you for the quick reply. On Wed, Feb 16, 2022 at 03:32:06PM +0100, Ties de Kock wrote: > Ouch. Fallback to rsync due to a DNS misconfiguration (which should > have recovered). Thanks for the confirmation. Indeed, my monitors seem to have returned to 'all clear'. > There are multiple instances behind a load-balancer. The current > storage is on NFS which has a performance limitation - it peaked at > about 80K operations/second (2m average). Welp! That's a lot of IO. Sharing from my own experience with a tiny publication point: I estimate there are about 4,000 RPs deployed on the Internet. Assuming their synchronisation attempts are evenly distributed across the hour, a naieve calculation suggests every single second a new client will attempt to connect. > We will follow up with a more detailed post-mortem. Much appreciated! Kind regards, Job -- To unsubscribe from this mailing list, get a password reminder, or change your subscription options, please visit: https://lists.ripe.net/mailman/listinfo/routing-wg
Re: [routing-wg] rsync://rpki.ripe.net rsyncd limits set too low?
On Wed, Feb 16, 2022 at 03:05:30PM +0100, Job Snijders wrote: > However, it seems RIPE NCC adjusted the default rsyncd settings and > lowered the concurrent connection count from 200 (which already is too > low for RPKI Repository Servers) to 150? Small correction: I appear to be confused about 200 being the default, according to documentation the default is 'unlimited' Kind regards, Job -- To unsubscribe from this mailing list, get a password reminder, or change your subscription options, please visit: https://lists.ripe.net/mailman/listinfo/routing-wg
[routing-wg] rsync://rpki.ripe.net rsyncd limits set too low?
Hi all, I noticed the RIPE NCC RRDP service (https://rrdp.ripe.net/) became unreachable at 2022-02-16 13:34:10 UTC+0 (and still is down). This RRDP outage event should not pose an issue for most RPKI validators, because most RPKI cache implementations (which follow best practises) will attempt to try to synchronize via RSYNC, in case RRDP is unavailable. However, it seems RIPE NCC adjusted the default rsyncd settings and lowered the concurrent connection count from 200 (which already is too low for RPKI Repository Servers) to 150? $ rsync --no-motd -rt rsync://rpki.ripe.net/repository/ @ERROR: max connections (150) reached -- try again later rsync error: error starting client-server protocol (code 5) at main.c(1666) [Receiver=3.1.2] I'm not familiar with the RIPE RPKI RSYNC service architecture, so the above error could be misleading: perhaps there is a loadbalancer distributing TCP sessions across multiple backends, each backend configured to serve up to 150 clients? Or perhaps there is a single rsyncd instance (in which case 150 definitely is too low). Is the RIPE NCC RPKI RSYNC service underprovisioned? If yes, why? Kind regards, Job -- To unsubscribe from this mailing list, get a password reminder, or change your subscription options, please visit: https://lists.ripe.net/mailman/listinfo/routing-wg
Re: [routing-wg] Open-sourcing of the RIPE NCC’s RPKI core software
Dear RIPE NCC RPKI team, On Wed, Feb 09, 2022 at 10:26:14AM +0100, Bart Bakker wrote: > We are pleased to announce that we have published the source code used > by the RIPE NCC for the RPKI back-end (the RPKI core) under the > 3-Clause BSD licence on Github: https://github.com/RIPE-NCC/rpki-core > The RPKI core is the RIPE NCC's software for creating and maintaining > RPKI objects based on the registry's current status and publishing > these in the repositories. Congratulations on this accomplishment and achieving this milestone! https://sobornost.net/~job/clap.gif :-) In the realm of cryptography, full transparency - unlimited and unrestricted access to source code is a critical cornerstone for building systems that can be relied upon. > The RIPE NCC hosts the authoritative repository internally. We use the > repository on Github to publish the source code externally. The first > commit is identical to the source code in the RIPE NCC's internal > repository at the time of that commit. The changes between releases > are squashed and published to this repository on deployment, and the > `main` branch reflects the code used by the production CA. Am I right in assuming that - going forward - commits won't be squashed (more than needed)? I imagine it'll be educational for the community to be able to follow the train of thought and storyline of future developments. > We encountered several challenges while preparing this project for an > open-source release. The main challenges were that the system uses > proprietary elements that were part of the revision history and cannot > be made public. Furthermore, it was not possible to review all > historic commits. We plan to present our challenges while > open-sourcing this project at RIPE 84. I look forward to the stories. Kind regards, Job -- To unsubscribe from this mailing list, get a password reminder, or change your subscription options, please visit: https://lists.ripe.net/mailman/listinfo/routing-wg
Re: [routing-wg] Penetration Test Report for RPKI
On Tue, Dec 21, 2021 at 01:23:01PM -0800, Randy Bush wrote: > > We hope you will find these reports useful > > very much so. thank you. Yes, I'd like to echo what Randy says. Thanks for sharing this. > btw, re RIPE-009 - Unencrypted Communication > > in the up/down protocol, objects are cms wrapped and hence signed and > objct authenticated; i.e. i would not panic about transport cia. Indeed. But I can imagine that in a world where virtually all (originally HTTP-only yolo) APIs now have been migrated to HTTPS, any API which ** by design ** is HTTP-only, would indeed stand out to pentest researchers. I think it is good the testers noticed this aspect, and also good that RIPE NCC noted in the response "Up-down remains on HTTP and uses a CMS wrapper for authentication." The up/down protocol is somewhat similar in terms of security considerations to how one can transport signed RPKI data from Publication Point (repositories) to Relaying Party (validator instances). In that context too, the use of unencrypted transport (like RSYNC, or PIGEON) is deemed acceptable because the threat model is based on a robust interpretation of object-security** to such an extend that transport-security is inconsequential. > otoh, i suspect there could be a path to move your delegated CAs to > TLS; which might be conservative in the long run. Would you mind elaborating on what you mean with the phrase "might be conservative in the long run?". Kind regards, Job ** One crucial corner stone to the concept of 'RPKI object security' is a thing called "RPKI Manifests". Manifests are an elegant and very powerful idea in the X.509 universe: the ability to securely group objects together. All modern validators use manifests: make sure your validator is updated to the latest version! Read more about what Manifests are here: https://datatracker.ietf.org/doc/html/draft-ietf-sidrops-6486bis-09 This doc is now going through IETF last-call. -- To unsubscribe from this mailing list, get a password reminder, or change your subscription options, please visit: https://lists.ripe.net/mailman/listinfo/routing-wg
[routing-wg] RPKI ROA MaxLength - feature or misfeature? (UX/security)
Hi all, I'm writing the working group to initiate some conversation about a long-standing point of confusion in the RPKI ecosystem: the ROA MaxLength field. What is the ROA MaxLength field? The data format profile of RPKI ROAs allows an operator to specify the following fields: * 1 (one) Origin AS * one or more IPv4 or IPv6 prefixes * for each IP prefix, a so-called 'MaxLength' value Operators are allowed to create multiple ROAs with different Origin ASNs covering the same prefix, folks can mix-and-match as needed. The "MaxLength" feature essentially is a macro function (a 'shortcut'): when you create a ROA with the following parameters: Prefix: 2001:67c:208c::/48 MaxLength: 50 Origin AS: 15562 The above Prefix + Maxlength has the exact same meaning as: Prefix: 2001:67c:208c::/48 or 2001:67c:208c::/49 or 2001:067c:208c:8000::/49 or 2001:67c:208c::/50 or 2001:67c:208c:4000::/50 or 2001:67c:208c:8000::/50 or 2001:67c:208c:c000::/50 Origin AS: 15562 The confusion & an UX experiment proposal = I suspect that many people think that "xxx/48 maxlength 50" means "the /48, AND the four individual /50s" (mentally skipping over the intermediate /49s). Going back as far as 2011 [1] - the concept of "MaxLength" appeared less than straight-forward, the quest for a good 'default setting' seems a challenge. My experience at NTT taught me that encouraging customers to create IRR "route:" or "route6:" objects that *exactly* match what people intend to announce in the BGP plane, greatly simpifies things. Just register what you want to announce, nothing more, nothing less. A proposal for UX experiment: would it be beneficial to HIDE the 'maxlength' field (for some period of time) in the RPKI ROA management system hosted by RIPE NCC? If the option isn't there, it can't confuse people. Wouldn't it be better to encourage people to create ROAs that align one-to-one with BGP announcements? (keep in mind: IRR route/route6 objects don't have the notion of maxlength). Or an enhancement: a button "also create ROAs for all /24s and /48s, but not the intermediate prefix lengths". This saves people a lot of clicking if they want to prepare for maximum de-aggregation. Is MaxLength used in the wild? == Only 15% of Validated ROA Payloads (VRPs) under the RIPE NCC Trust Anchor have the MaxLength field set to something other than the aggregate Prefix Length. I'm not entirely convinced that accommodating the 15% is worth the hassle of explaining what the heck MaxLength is. Removing MaxLength from the UI does not in any way impact anyone's ability to create as many ROAs as they deem fit, it just forces people to be precise! :-) Thoughts? Kind regards, Job [1]: https://labs.ripe.net/author/alexband/using-the-maximum-length-option-in-roas/ -- To unsubscribe from this mailing list, get a password reminder, or change your subscription options, please visit: https://lists.ripe.net/mailman/listinfo/routing-wg
Re: [routing-wg] Code Audit Report for RPKI
Dear Bart, RIPE NCC RPKI team, On Fri, Dec 03, 2021 at 12:47:05PM +0100, Bart Bakker wrote: > Continuing from the work we started last year on strengthening our > security compliance, we have asked an external party to carry out a > security audit of our RPKI code. This was an important element in > preparation for open sourcing the RPKI core code, which will be done > in early January 2022. That is welcome news! > We are publishing the security report for the second year in an effort > to increase transparency and trust in the RPKI system. On our website > [0], you will now find the code audit report written by Radically Open > Security 2021 and our response to their findings. > > We hope you will find these reports useful, and we look forward to > your feedback. > > [0] - > https://www.ripe.net/manage-ips-and-asns/resource-management/rpki/security-and-compliance Thank you for sharing this. Both the audit report and the response to the audit report seemed comprehensive and informative. Out of curiosity, will RIPE NCC employ a different (new) auditor in 2022? Periodically changing auditors can potentially help increase the diversity in terms of perspective on code and security. Each auditor represents 'fresh eyes', a useful characteristic when dealing with complex systems. Kind regards, Job -- To unsubscribe from this mailing list, get a password reminder, or change your subscription options, please visit: https://lists.ripe.net/mailman/listinfo/routing-wg
Re: [routing-wg] Add BGPsec support to Hosted RPKI?
On Mon, Oct 11, 2021 at 11:33:40AM +0200, Tim Bruijnzeels wrote: > Why now? There are published RFC and running code. Time for the next step. > RIPE NCC may have substantial resources, but they are applied > sequentially. Perhaps RIPE NCC can shed a light on the effort > involved, but I suspect it's more than we might think. It is not clear to me what you mean with "more than we might think". I think a standard PKCS#10 / PKCS#7 exchange is less involved than implementing support for an entirely new Signed Object profile? Additionally, to implement BGPsec support in the hosted environment, the developers can take inspiration from prior-art. For ASPA no examples exist yet. > In that context, I am not against BGPSec as such, there are just > things that I would like to see first: Thank you for sharing your personal wishlist. The above 'context' seems to depend on an assumption that work progresses sequentially. > 1. Publication Service > > I think this has immediate applicability to anyone considering running a > delegated > CA. It's also in the interest of the ecosystem to limit the excessive growth > in the > number of repositories. > > 2. ASPA > > This is in draft status, but so were ROAs when the production system launched > in > January 2011. I'll with the working group a brief overview of where ASPA is, which should help understand why ASPA probably is more of a 2022/2023 project. I'm personally involved as co-author of the ASPA specification. * ASPA is 2 drafts, 0 RFCs: https://datatracker.ietf.org/doc/search/?name=aspa=on=on This means that ASPA has not yet received review from the wider IETF community. * As of yet no codepoints have been assigned to ASPA https://www.iana.org/assignments/rpki/rpki.xhtml * No RPKI-to-Router protocol extension has been proposed for ASPA. * At the IETF 111 SIDROPS meeting it was suggested to first construct a testbed before moving ASPA forward. See slids 11 and onwards: https://datatracker.ietf.org/meeting/111/materials/slides-111-sidrops-running-code-sidrops-00 The testbed does not exist yet: https://github.com/SIDROPS/ASPA-testbed * ASPA's running code status page still is empty https://trac.ietf.org/trac/sidrops/wiki ("AspaImplementations?" is a non-existing wiki page) * Only a few months ago an issue was discovered in the ASPA verification algorithm in relationship to IX Route Servers. This has since been resolved (in -07 of the draft). To me it is an indicator that ASPA's specification still is in flux. All in all ASPA undeniably is making progress, I would love for a RPKI-based routing policy signaling mechanism to exist, but there is a lot of work yet to be done. I would suggest to start a discussion about adding ASPA to the RPKI Quarterly planning as soon as passed at least "IETF Working Group Last Call". Kind regards, Job
Re: [routing-wg] Support for "Publish in Parent" [RPKI RFC 8181]?
On Thu, Oct 07, 2021 at 04:30:58PM +0200, Tim Bruijnzeels wrote: > If this is added to the RIPE NCC RPKI backlog then I would also > request that LIRs, and PI holders, can have multiple CAs publish at > the RIPE NCC. The reason for this is that one of benefits of running a > delegated CA lies in having the option to sub-delegate to child CAs. > For example one can create child CAs with specific sub-sets of > resources for departments, business units etc. To make this scale it > would very beneficial if those children could publish under the > publication server as well. You make a good point. Kind regards, Job
Re: [routing-wg] Add BGPsec support to Hosted RPKI?
On Wed, Oct 06, 2021 at 04:08:00PM +0200, Tim Bruijnzeels wrote: > Contrary to Route Origin Validation (with ROAs) there is no 'not > found' state. I don't think it is helpful to attempt to put BGPsec and ROAs in the same equivalance class, draw parallels and then conclude that the 'not-found' state is something problematic that is lacking in BGPsec. The concepts and designs of both technologies are very different. > This means that if there is large scale issue with RPKI itself or your > ability to validate RPKI data, BGPSec will end up saying your path is > invalid. I think this is a rather scary property. Indeed, BGPsec has a hard dependency on the RPKI being up and healthy. This is unavoidable consequence of the design decision to make one technology (BGPsec) dependent on another technology (the RPKI framework). The same of course applies to Route Origin Authorizations: if there is a large scale issue with the RPKI, one's ability to work with given RPKI data is impacted. I think the RIRs and NIRs are increasingly understanding that their RPKI services are expected to perform flawlessly. Great operational discipline is expected from Trust Anchors. (this applies to the TLS WebPKI too). At the end of the day, BGPsec (and RPKI) will not fancy everyone or be applicable for every possible situation, that's OK. > @2- incremental deployment is hard > > BGPSec validation can only result in 'valid' if ALL ASNs on the path > sign. Until that time the path will be 'invalid'. So BGPSec validation > can only really be turned on after this point has been reached, and > until this point has been reached there is no benefit and therefore no > incentive to operators to buy BGP hardware that supports BGPSec, and > publish their router keys as BGPSec certificates. In practise the characteristic that you describe means that BGPsec deployment can happen incrementally on (for example) private peering between two companies. Indeed, not on 'full table transit' sessions. For example, in at large-scale cloud provider to cloud provider peering sessions, there often times are no downstream ASNs to be seen on either side. The traffic volumes are high, the number of routes on each side fairly low. As BGPsec-signed paths cannot traverse non-BGPsec topology, partial BGPsec deployment forms islands of assured paths. As islands grow to touch each other, they become larger islands. To do incremental deployment, both sides simply need to agree to use BGPsec, and not permit non-BGPsec sessions to establish at the particular intersection point. Keepin mind that a possible solution to prevent 'downgrade attacks', is to not tolerate downgrades... An analogy: I don't think anyone is expecting a BGP session to establish if there is a mismatch in TCP-MD5, TCP-AO, or IPsec configuration between two peers. The goal is for sessions NOT to establish if the password is wrong. > Because of the above I don't think that adding BGPSec support in the > hosted interface will help. Don't get me wrong.. I would *love* for > BGPSec to succeed. I believe the path to success starts with actually making the technology available to increasingly larger groups of people. Literally making it "as hard as possible" to deploy BGPsec (aka, "maintaining the status quo"), will unsurprisingly lead to BGPsec 'not succeeding'. I don't know what the future holds and whether BGPsec will 'succeed', but I do know there is only one way to find out: making an honest effort to make it work. [ anecdote: I remember that in the early days of IPv6 it was quite hard to get IPv6 blocks in the RIPE region. To receive an IPv6 PI block, you had to be BGP multi-homed. This requirement did not exist for IPv4 PI space. Consequently, many people continued to request IPv4 PI blocks not spending any time on IPv6, because the RIR wouldn't give them IPv6 space to deploy. ] > I would like to be proven wrong in my interpretation. But as it stands > I think a fundamental discussion is needed (in the IETF as well) on > how it can be made incrementally deployable - such that there is > benefit to early adopters - and get a safe landing in case of errors. > If this can be achieved, or if someone can explain how this is already > achieved, then I would be much less skeptical. I don't know what your interpretation is based on, we clearly lack common experience and perspective on BGP routing. As for 'safe landing' (a nice sounding phrase), but in DNSsec there are no safe landings either. It is possible to productively use and operate systems in which the 'safe landing' would be to disable the system entirely. I recommend everyone to read https://datatracker.ietf.org/doc/html/rfc8374 and https://datatracker.ietf.org/doc/html/rfc8207 to get a feel for why some choices were made and what gotcha's exist. Kind regards, Job
Re: [routing-wg] Add BGPsec support to Hosted RPKI?
On Mon, Oct 04, 2021 at 11:48:12PM +0330, Ehsan Ghazizadeh wrote: > Its an old doc worth reading. You are offering the working group information from 2009. The same year "Call of Duty: Modern Warfare 2" was released. Since then, a number of IETF-consensus documents have been published. For example the BGPsec specification itself. Here is a timeline: Feb 2014, RFC 7132 - Threat Model for BGP Path Security Aug 2014, RFC 7353 - Security Requirements for BGP Path Validation Sep 2017, RFC 8205 - BGPsec Protocol Specification Sep 2017, RFC 8206 - BGPsec Considerations for Autonomous System (AS) Migration Sep 2017, RFC 8207 - BGPsec Operational Considerations Sep 2017, RFC 8208 - BGPsec Algorithms, Key Formats, and Signature Formats Sep 2017, RFC 8209 - A Profile for BGPsec Router Certificates, Certificate Revocation Lists, and Certification Requests Apr 2018, RFC 8374 - BGPsec Design Choices and Summary of Supporting Discussions Jun 2019, RFC 8608 - BGPsec Algorithms, Key Formats, and Signature Formats Aug 2019, RFC 8634 - BGPsec Router Certificate Rollover Aug 2019, RFC 8635 - Router Keying for BGPsec If at this point there still are undocumented gotcha's, they aren't gonna be found in a vacuum. Lowering barriers (by for example making it easier to manage BGPsec in the RPKI dashboard) will increase the number of people able to take a look at BGPsec, and subsequently improve the technology. Kind regards, Job
Re: [routing-wg] Add BGPsec support to Hosted RPKI?
Hi Ehsan, working group, On Mon, 4 Oct 2021 at 14:17, Ehsan Ghazizadeh wrote: > As far as i know, no vendor supports bgpsec, so what's the point of adding > bgpsec support to hosted rpki? > There already are multiple RPKI validators which support BGPsec, multiple signers, and multiple BGPsec-capable BGP implementations. Whether one likes the currently available choices is of course a somewhat subjective matter. :-) BGPsec - at present - definitely isn’t the operators “go to” tool; but the specification has been published via the IETF RFC standards track, received significant scrutiny, and multiple independent implementations have been produced. It takes a lot of community effort to go from 0 to 1, and from 1 to 100. I think adding BGPsec support to hosted RPKI management dashboards might help make BGPsec more mainstream, in turn increasing demand for additional (commercial off the shelf) implementations. The effects of obstacles to deployment often appear to compound. also cause of encryption/decryption process via async encryption method, > it's a resource intensive process so not all routers are able to handle it, > also the more important part is bgpsec changes the normal behavior of bgp, > for instance, update packing (update group) will be disabled. > Indeed, it is always important to use equipment suitable for the job at hand. It might make sense to keep an eye out for BGP routers with AVX512 support in their CPU, rather than attempting to retrofit this type of tech onto 32-bit PowerPC based platforms. :-) Are we just discussing the support of bgpsec certs on hosted rpki, and we > would discuss bgpsec deployment impacts and open issues later? > I believe the current discussion is about the first aspect. But I love and welcome dialogue on deployment impact and any open issues (so the community can work on addressing each and every issue)! Evaluating and (potentially) deploying BGPsec in production environments is a multi-year project, just like RPKI-based BGP Origin Validation was. Kind regards, Job >
[routing-wg] RPKI planning @ RIPE (Was: Support for "Publish in Parent" [RPKI RFC 8181]?)
Dear Nathalie, group, On Mon, Sep 20, 2021 at 03:11:22PM +0200, Nathalie Trenaman wrote: > Please be aware that the roadmap you mentioned just shows the roadmap > for the current quarter and not for a longer period. Ah, thank you for the clarification. Are there any other items that predate the existence of the "Community Input on Planning" table on [1]? As some RPKI projects are multi-quarter or even multi-year projects, it might be good to expand the community's visibility into the list of RPKI-related work items which RIPE NCC to some degree has accepted, but not yet scheduled. Kind regards, Job [1]: https://www.ripe.net/manage-ips-and-asns/resource-management/rpki/rpki-planning-and-roadmap
[routing-wg] Support for "Publish in Parent" [RPKI RFC 8181]?
Hi working group, In recent mail threads the concepts of "Hosted RPKI" and "Delegated RPKI" came up, but as mentioned by Tim and Rubens, another flavor also exists! A "hybrid" between Delegated and Hosted, informally known as "publish in parent" (aka RFC 8181 compliant Publication Services). There are multiple benefits to the general RPKI ecosystem when RIRs and NIRs support RFC 8181: * Resource Holders are relieved from the responsibility to operate always online RSYNC and RRDP servers. * Reducing the number of Publication servers reduces overall resource consumption for Relying Parties. Consolidation of Publication Servers improves efficiency and is generally considered advantageous. * Helps avoid "reinventing the wheel": it might be better to have a small group of experts build a globally performant and resillient infrastructure that serves everyone, rather than everyone building the 'same' infrastructure. Other RIRs and NIRs are also working on RFC 8181 support. RFC 8181 is relatively new so it'll take some time before we see universal availability. NIC.BR (available): https://registro.br/tecnologia/numeracao/rpki/ APNIC (available): https://blog.apnic.net/2020/11/20/apnic-now-supports-rfc-aligned-publish-in-parent-self-hosted-rpki/ ARIN (planned): https://www.arin.net/participate/community/acsp/suggestions/2020/2020-1/ Is implementing RFC 8181 support something RIPE NCC should add to the https://www.ripe.net/manage-ips-and-asns/resource-management/rpki/rpki-planning-and-roadmap ? What do others think? Kind regards, Job Relevant documentation: https://datatracker.ietf.org/doc/html/rfc8181
Re: [routing-wg] Add BGPsec support to Hosted RPKI?
Hi Rubens, others, On Sun, Sep 19, 2021 at 08:06:54PM -0300, Rubens Kuhl wrote: > Our experience in Brazil is that delegated RPKI is not much of an > issue provided its software deployment is easy enough. Krill + Lagosta > + Up/Down activation + Upwards ROA publishing adds to being really > effective. Good to hear from Brazil! Indeed a number of organizations have worked hard to remove as many barriers as possible towards the Delegated RPKI. Impressive progress. My message was not intended to take away from "Delegated RPKI" deployments (I run one myself!), but rather to suggest that the "Hosted RPKI" Dashboard should _also_ make it possible to certify & publish BGPsec Router keys. I suspect that "Hosted RPKI" will always be popular: clearly many operators feel comfortable outsourcing the issuance & publication of their ROAs to the RIR. I think it is important to study feature gaps between "Hosted" and "Delegated". Kind regards, Job ps. A: What about ASPA? Q: why not both? :-) I'm working to start an IANA "Early Allocation" procedure to obtain codepoints for ASPA. When progress has been made I'll circle back to routing-wg@ in a new thread, unless someone beats me to it. :-)
[routing-wg] Add BGPsec support to Hosted RPKI?
Dear all, [ TL;DR: What does the working group think about supporting an extension to the RPKI Dashboard to enable publication of BGPsec certs? ] At the moment the hosted "RPKI Dashboard" at https://my.ripe.net/#/rpki, only permits Resource Holders to create RPKI objects of one specific type: ROAs. However, a wider range of RPKI cryptographic product types also exists, for example: BGPsec Router Certificates [RFC 8209]. BGPsec is a RPKI-based technology which enables network operators to transitively validate whether a given BGP UPDATE - indeed - passed through the Autonomous Systems listed in the path. One way to think of BGPsec is as an ECDSA protected network of channels between a receiving EBGP node; and one (or many) routers in the BGP route's Origin AS. I think BGPsec can be useful to protect "private peering" at large scale, and another use case is to increase confidence in routing information distributed via IXP Route/Blackhole Servers. Right now, routing protocol researchers and network operators wishing to publish BGPsec Router Keys, also have to learn how to master "Delegated RPKI": a deployment model with a steep learning curve. I think there are benefits to the community if RIPE NCC appends an activity to the "RPKI Planning and Roadmap" to implement procedures to sign and publish BGPsec Router Keys via a PKCS#10 / PKCS#7 exchange, callable via both API and dashboard WebUI. What do others think? Kind regards, Job Relevant documentation: https://datatracker.ietf.org/doc/html/rfc8209 https://datatracker.ietf.org/doc/html/rfc8635
Re: [routing-wg] request for feedback: a RPKI Certificate Transparency project?
Hi Tim, > But this should start with a problem statement which is discussed in > the IETF. The context of the RPKI standards matter and a lot of the > contributors to those standards are not active here. It is not uncommon for initiatives to start in a special interest group outside the IETF, and then later on be presented to the appropriate IETF working group. For example the origins of the development of BGP Large Communities can be traced back to a NetNod meeting [1], later on the design was influenced based on feedback received at Routing WG @ RIPE 72, and then finally the specification was published as RFC via the IETF IDR WG. This message [2] is intended to start a conversation in the RIPE community specifically about the topic of Certificate Transparency and RPKI, because CT appears to have critically improved the WebPKI. > As it stands I think that asking the RIPE NCC to make a big investment > without further analysis is questionable. I agree, more study is needed before committing to big investments. Gauging community interest is part of the exploratory phase of the process. > It is also not sufficiently clear to me how and why this problem is > more urgent than other investments in RPKI, I don't recall anyone suggesting this is "more urgent than other investments"? > e.g. providing a Publication Server service for members, and investing > in support for ASPA. RIPE NCC maintains a list of plans here [4]. Neither Publication Server service nor ASPA are listed as of yet. Specific to about ASPA: as per last IETF 111 SIDROPS meeting [3], I think ASPA is pending the development of a testbed between various vendors coordinated through that IETF working group. It'll depend on market forces at what pace ASPA moves along. And do keep in mind that deployment of ASPA would mean we (network operators) collectively even more increase our dependency on the RPKI, which in my opinion strengthens the case to talk about additional oversight and auditability of Trust Anchors ... perhaps through Certificate Transparency! Kind regards, Job [1]: http://largebgpcommunities.net/2016/where-did-large-communities-start/ [2]: https://www.ripe.net/ripe/mail/archives/routing-wg/2021-September/004397.html [3]: https://www.youtube.com/watch?v=DtnFulym8CQ [4]: https://www.ripe.net/manage-ips-and-asns/resource-management/rpki/rpki-planning-and-roadmap
Re: [routing-wg] request for feedback: a RPKI Certificate Transparency project?
On Fri, Sep 10, 2021 at 11:39:39AM +0200, Tim Bruijnzeels wrote: > I think all would agree that transparency is good. > > A key difference between RPKI and most other PKIs is that in the RPKI > all objects are published in the open for all the see. Small nitpick: all objects are SUPPOSED to be published, in the open, for all to see. However it is important to keep in mind we cannot assume all objects were published in a way for all to see. > As you mentioned your RPKI validator may miss intermediate state > changes if it retrieves objects using rsync, but the RRDP protocol > supports deltas, see [1]. > > I believe that transparency can most easily be achieved by ensuring > that these deltas are preserved, and that they cannot be modified. RRDP is an unauthenticated and unsigned protocol. It is possible for a Publication Point to present different RRDP deltas to one RP compared to what they present to another RP. Archiving RRDP deltas is interesting, but IMHO happens too late in the pipeline for TA/CA audit purposes. RRDP is not a replacement for Certificate Transparency, both technologies solve different problems. Kind regards, Job
[routing-wg] request for feedback: a RPKI Certificate Transparency project?
Dear all, With summer turning to fall in the Northern Hemisphere, yet again a new schoolyear is ahead of us! :-) I hope you all are well. I'm writing the group to solicit feedback for me and others to consider during upcoming deliberations about activity plans, but even more so as an RPKI enthusiast who is curious to learn what others see as potential future evolutions of the RPKI technology stack. [ TL;DR - Ask to the routing community: is there interest to coordinate and support an industry-wide project to introduce the principles of "Certificate Transparency" to the RPKI? The project size could be substantial, but so are the upsides. ] Intro: global deployment & operation of the RPKI is a multi-decade project == Over the last 21 years this industry has collectively helped grow and nurture 'Secure BGP' [1] into the RPKI/BGP deployment as we know it today: the smallest and largest of networks in the Default-Free Zone core are anchoring their BGP routing decisions to a RPKI covering 31% of space, which in turn helps connect billions of End Users to the Internet. >From my personal perspective [10], the RPKI has now reached some level of maturity. Perhaps now is the time for some of our community's focus to shift towards designing and implementing innovations on top of the current RPKI, without jeopardizing its current plateau of stability. What does trusting a Trust Anchor mean? === Some people have (correctly!) pointed out that RPKI Trust Anchor (TA) operators technically can issue certificates related to any Internet Number Resource, a consequence of some people considering "all resources" [5] being subordinate a necessity for day-to-day TA operations. While I am aware of some minor concerns about the "all resources" framework (and I personally see room for improvement!), for me the big question is not "who do I trust?", but "what did they actually do after I started trusting them?". In this reality where RIRs can sign "everything" and I (as RP operator) can cryptographically verify that what I observed through periodic polling [6] was indeed signed by my locally configured Trust Anchor(s)... one thing seems to be missing! I don't know anything about what my RP didn't observe! :-) Perhaps some certificates were issued and very quickly revoked concerning subordinate Internet Number Resources of great importance to me? How would I know if I didn't see it myself? I don't expect to trust Trust Anchor operators to never make any mistakes, but I do wish to be in a position where I can assess past performance, and can compare third-party audit logs, to inform my future decisions! To me it seems important to increase our collective visibility into TA/CA takes & mistakes. ("Mistakes" meaning the issuance or revocation of certificates non-compliant with the policy outlined in RIPE-751). Most Internet Routing incidents are analyzed after-the-fact through the use of Route Views [8], RIPE RIS [9], or information viewplayers like BGPlay. Everyone being enabled to "scrub back in time" greatly enhances our group's ability to understand what transpired and how to prevent it going forward. What is the RPKI equivalent of BGPlay at a cryptographicly auditable level of detail? ... maybe Certificate Transparency? [7]. Copying good ideas from other PKIs: Certificate Transparency The RPKI is built on top of X.509 and CMS tech. Any developments in other X.509 special interest communities (such as WebPKI [2], aka "the https:// experts"), may be amazing ideas or methods worth copying into 'our Internet Number Resource PKI' ecosystem. One of the inventors [3] of public-key cryptography (a core concept in the RPKI), also came up with an idea known as "Merkle Trees" [4]. This concept can be used to construct inter-domain "append-only" logging facilities, which can be incredibly useful to help increase trust in a Trust Anchor in an "assumed trust" model. I'll try to explain why below! A key concept in Certificate Transparency is that a CA ('the signer') - ahead of time - shares with select third parties (so-called 'CT Logs') their commitment to sign a given digital object. After acknowledgement from the CT Log(s), the signer proceeds to sign and publish the RPKI object. The CT Logs use Merkle Trees to allow external auditors to 'losslessly replay' all observations of certificate issuance from a given CT Log, and compare CT Logs with each other. Implementation of Certificate Transparency would provide this community with something analogous to the RIPE Database "Historical Queries". The major difference being all logged data comes with cryptographic assurances, and the data can be hosted and audited by both RIPE NCC and any third parties (anyone with Internet access!). RIPE NCC sending precertificate information to CT Logs?
Re: [routing-wg] RPKI Quarterly Planning
On Tue, Jul 13, 2021 at 05:25:11AM +0200, Daniel Karrenberg wrote: > It might also be that the operational community has chosen other fora to > discuss because this working group is not working. What a strange thing to say. Of course there are other fora to discuss RPKI, one of the most important ones is IETF's SIDROPS working group (which is quite active!). As for the road map - RIPE NCC indicated feedback can be shared with the routing-wg@ or with r...@ripe.net. I myself opted to try the latter route to re-iterate a request for publish dashboards and graphs about the RPKI service which resulted in 'RPKI-2021-#01' being added to the roadmap. The motivation behind RPKI-2021-#01 is that many IXPs offer publicly accessible graphs ala: https://www.ams-ix.net/ams/documentation/total-stats https://portal.linx.net/ https://www.jpnap.net/ix/traffic.html https://www.netnod.se/ix/statistics https://de-cix.net/en/locations/frankfurt/statistics When incidents happen, these graphs enable the IX participants to quickly understand whether 'something is wrong', because humans are really good at pattern recognition. I imagine that developing more insight into the RIPE NCC RPKI service will offer the community similar benefits as what the community gleans from these public IX stats, hence the ask for RPKI-2021-#01. Kind regards, Job
Re: [routing-wg] RPKI Quarterly Planning
Hi, On Mon, Jul 12, 2021 at 10:23:20AM +0200, Daniel Karrenberg wrote: > Natanlie pointed us to > https://www.ripe.net/manage-ips-and-asns/resource-management/rpki/rpki-planning-and-roadmap > a while ago. Among other things this says: > > “In preparation for the improved RPKI repository architecture, the > distributed nature of the RRDP repository is going to be implemented using > containers and krill-sync that pulls data from the centralised on-premise > repository. This greatly simplifies smooth transitioning between publication > servers without any downtime. > > NOTE: We are not referring to cloud technologies here, just to our internal > deployment technologies.” > > The silence here worries me. What silence?! Over the last few months there have been quite some mail threads in this working group about RPKI and RPKI outage incidents, and NCC staff have provided updates during the virtual RIPE meetings in the Routing WG slot. To me the roadmap seems to reflect the sentiment that reliability is the key objective at this moment in time. > I would like to see some feedback from this group whether this is what > you want to see happening. The RIPE Routing WG is the forum for giving > guidance to the RIPE NCC about RPKI. I know other channels exist too > and that is fine. I also know that individuals here seem to be happy > with what is happening. However private channels and conversations are > not the way RIPE does this. This group is where the RIPE NCC looks > for guidance and where that guidance gets properly archived and > responded to. To be honest I am not sure what the purpose of krill-sync is. In May 2021 [1] extensive testing was conducted with the help of the NLNOG RING to see if krill-sync could be used to power the RSYNC service, but it turned out there were multiple issues with krill-sync making it a suboptimal choice. I believe RIPE NCC ended up deploying a different solution to serve RSYNC - and my hope is that the recently-achieved stability is here to stay, because the current setup seems to work quite nicely. As for 'hidden RRDP' master, I fail to see what the benefits of krill-sync are compared to say Varnish [2] (or Squid [3]). Or what already is achieved by using a CDN to deliver the RRDP deltas. Maybe the krill-sync reference is an outdated comment? Kind regards, Job [1]: https://www.ripe.net/ripe/mail/archives/routing-wg/2021-May/004345.html [2]: https://varnish-cache.org/ [3]: http://www.squid-cache.org/
Re: [routing-wg] request to enable ICMP echo-reply on rpki.ripe.net?
On Fri, May 07, 2021 at 03:29:44PM +0200, Nathalie Trenaman wrote: > Our ops team just enabled ICMP echo-reply on rpki.ripe.net. Thank you. Have a good weekend! Kind regards, Job
Re: [routing-wg] request to enable ICMP echo-reply on rpki.ripe.net?
On Wed, May 05, 2021 at 12:52:51PM +0200, Kurt Kayser wrote: > you surely know that every enabled protocol/port is a potential threat. Sometimes disabling a protocol or port is a potential threat (because hindering troubleshooting efforts harms network stability). RIPE NCC is the only RIR that does not respond to ICMP Echo Requests on their main RPKI service. Kind regards, Job $ ping -c 1 rpki.afrinic.net PING rpki.afrinic.net (196.216.2.26): 56 data bytes 64 bytes from 196.216.2.26: icmp_seq=0 ttl=48 time=183.631 ms --- rpki.afrinic.net ping statistics --- 1 packets transmitted, 1 packets received, 0.0% packet loss round-trip min/avg/max/std-dev = 183.631/183.631/183.631/0.000 ms $ ping -c 1 rpki.apnic.net PING rpki.apnic.net (203.119.101.18): 56 data bytes 64 bytes from 203.119.101.18: icmp_seq=0 ttl=240 time=315.433 ms --- rpki.apnic.net ping statistics --- 1 packets transmitted, 1 packets received, 0.0% packet loss round-trip min/avg/max/std-dev = 315.433/315.433/315.433/0.000 ms $ ping -c 1 rpki.lacnic.net PING rpki.lacnic.net (200.3.14.185): 56 data bytes 64 bytes from 200.3.14.185: icmp_seq=0 ttl=49 time=204.922 ms --- rpki.lacnic.net ping statistics --- 1 packets transmitted, 1 packets received, 0.0% packet loss round-trip min/avg/max/std-dev = 204.922/204.922/204.922/0.000 ms $ ping -c 1 rpki.arin.net ping: Warning: rpki.arin.net has multiple addresses; using 199.71.0.150 PING rpki.arin.net (199.71.0.150): 56 data bytes 64 bytes from 199.71.0.150: icmp_seq=0 ttl=51 time=152.630 ms --- rpki.arin.net ping statistics --- 1 packets transmitted, 1 packets received, 0.0% packet loss round-trip min/avg/max/std-dev = 152.630/152.630/152.630/0.000 ms
[routing-wg] request to enable ICMP echo-reply on rpki.ripe.net?
Hi RIPE NCC, hi all, In today's troubleshooting adventure, an operator experienced difficulty pinpointing where exactly a connectivity issue between them and rpki.ripe.net (193.0.6.138 + 2001:67c:2e8:22::c100:68a) resided. It would be helpful if RIPE NCC reverted disabling responding to ICMP echo requests originating from the Internet. Would it be possible to adjust the firewall settings to accomodate troubleshooting and monitoring? Right now connectivity testing has to be performed directly against the rsync daemon's internet-exposed TCP port (873) - but it would be much cheaper and faster for both the tester and the service hoster if instead ICMP echo requests could be used as an early warning system (rather than the rsync service itself). $ ping -c 6 rpki.ripe.net PING rpki.ripe.net (193.0.6.138): 56 data bytes --- rpki.ripe.net ping statistics --- 6 packets transmitted, 0 packets received, 100.0% packet loss The above test result differs compared to sending echo requests to molamola.ripe.net or manus.authdns.ripe.net. Kind regards, Job
Re: [routing-wg] TC x IRRd 4.2
Dear Rubens, all, On Tue, Apr 27, 2021 at 10:18:32PM -0300, Rubens Kuhl wrote: > TC IRR, an IRR operator focused on Brazilian networks, just changed to > IRRd 4.2. The new version allowed TC to deploy RPKI validation > (thanks NTT for sponsoring that development) and expose HTTPS > endpoints for WHOIS and submission that we hope will foster innovation > around the database. > > Every precaution was taken for this migration to be seamless for other > IRR operators, including matching of serial numbers. Every IRR server > that mirrored TC and supported -j status query was verified that it > followed and still correctly follows database journals. > > But if anything appears broken, please let me know or e-mail > db-ad...@bgp.net.br. Congratulations to you and the TC team for reaching this milestone! TC's use of RPKI-based IRR Object filtering combined with the efforts of NIC.BR, IX.br, and LACNIC to promote RPKI in Brazil, make the Brazilian community a positive example of a seamless integration between IRR and RPKI. Thank you for your efforts to increase the data quality of the TC registry. Kind regards, Job
[routing-wg] How BGP routes can get 'stuck' in the Default-Free Zone
Dear group, I'd like to draw your attention to an excellent article on an intricate interaction between BGP and TCP which can result in 'zombie routes' in the BGP Default-Free Zone. https://blog.benjojo.co.uk/post/bgp-stuck-routes-tcp-zero-window My current running theory on the root cause of some mishaps in the global routing system is that certain BGP implementations can end up in a broken state where such systems will still generate and send out KEEPALIVE messages, but are unable to process other BGP messages (and such a system instructs all its peers to not send new data by signalling a zero TCP receive window). This is "Problem #1". "Problem #2" is that almost all BGP implementations are unable to robustly deal with systems suffering from Problem #1. Allmost all BGP implementations assume that when KEEPALIVE messages don't make it across the wire, the remote system will initiate the session tear down. But of course, if the remote system is in such a broken state that it can't issue session tear downs ... the combined system state is perpetually broken. The Security Section of https://datatracker.ietf.org/doc/html/draft-spaghetti-idr-bgp-sendholdtimer elaborates on three detrimental facets of the above situation. It is quite rare for systems to end up in the "Problem #1" state, but when it happens, all systems connected to the broken node probably are better off disconnecting from such a system than perpetually forwarding (and potentially blackholing) Internet traffic into the broken system. Kind regards, Job
Re: [routing-wg] Issue affecting rsync RPKI repository fetching
Dear Ties, group, Thank you for the outline. On Wed, Apr 14, 2021 at 02:33:37PM +0200, Ties de Kock wrote: > The RPKI application does not support writing the complete repository to disk > for each state (as needed for spooling the repository as proposed in scripts). > Synchronously writing every state of the repository to disk is not feasible, > given our update frequency and repository size. Functionality for > asynchronously writing the repository to disk needs to be developed. We have > two > paths to develop this: > - The first is a new daemon that writes to disk from the database state at a > set interval. > - The second one is using RRDP as a source of truth and writing the > repository to disk. > Furthermore, we would need to migrate the storage from NFS to have faster > writes. > > Both approaches need an extended period for validation and we are not able to > deploy these within a few weeks. The latter approach (using RRDP) has less > risk > and is the option we are aiming for at the moment. We plan to release the new > publication infrastructure in Q2/Q3 2021 and hope to migrate earlier. The "RRDP as source of truth" approach indeed seems the more appealing (and simpler!) option. I would encourage the NCC to follow that path. In the mean time, can https://www.ripe.net/support/service-announcements/service-announcements/current be updated to reflect that there are known race conditions and problems with the RIPE NCC RSYNC service? Are there any other tweaks the NCC can think of that reduce the operational pain? Maybe increasing the publication interval? Kind regards, Job
[routing-wg] RPKI: how to migrate an entire industry from RSYNC to RRDP?
Hi all, Some might be wondering what the deal is with RSYNC and RRDP? Why it is critical to continue to support RSYNC in the mid-term? What's the industry's plan to migrate from RSYNC to RRDP? TL;DR - All RIRs need to support both RSYNC and RRDP until at least 2024. - All RRDP-capable validators need to make sure they are fully backwards compatible with RSYNC, until RIRs no longer observe RSYNC traffic. Some background: The development of the RPKI technology stack began more than a decade ago in the IETF. From the start, RSYNC was the preferred synchronisation protocol between RPKI publication servers (such as 'rpki.ripe.net') and Relying Parties (the likes of Telia, NTT, Amazon, me, and you). RSYNC was picked for a number of reasons: it was available & easy. A core concept of the RPKI technology stack is that RPKI objects can be transported via any means: carrier pigeon (RFC 2549 ;-), USB stick, FTP, RSYNC, ... anything! This is possible because RPKI exclusively relies on 'object security', the RPKI objects themselves contain all information that is required to perform X.509-based validation. As time went by, a second approach was developed to synchronize RPs to fresh data generated by CAs. It was recognized that where RSYNC servers need to calculate the 'difference' between the client has and what the server has (right after the RSYNC client connects), such data could also be generated a-priori and stored in static files. Pre-generating such a 'journal' of all changes in a repository is considered to be far efficient than calculating it on the fly. The RRDP protocol has many appealing properties! In 2017 the RRDP protocol was published as IETF RFC as 'nice to have' synchronization protocol for the RPKI. Since then, more and more Publication Servers operators and RP software implementers worked to support the new RRDP protocol alongside the old RSYNC protocol. With two options available, how to migrate? --- This Gordian Knot has two aspects: all deployed RPKI repository operators have to support RRDP, and all deployed Relying Party have to support RRDP. This means that for a succesful transition, for a moment in time, all stakeholders have to support both RSYNC and RRDP. The industry has not yet reached that point. I expect this to happen somewhere in 2023. At this moment of writing, not all Relying Party software, and not all RPKI Repository Operators support RRDP. Based on various communications in the IETF it is clear everyone is working towards implementing RRDP support. However producing a safe implementation of RRDP is not a trivial task, it takes time. As RSYNC existed first and RRDP came later, everyone should be allowed ample time to make the transition. While everyone waits for everyone to deploy RRDP capable software, the trick is to PREFER synchronizing via RRDP (and if it fails try RSYNC). Relevant Internet-Draft: https://datatracker.ietf.org/doc/draft-ietf-sidrops-prefer-rrdp/ The concept this is somewhat analogous to 'Happy Eyeballs': for a period of time many considered it advantageous for global IPv6 deployment to give IPv6 just a little bit of a head start compared to IPv4. People knew that purposefully degrading IPv4 would not motivate people to embrace IPv6. Also, preferring RRDP (and only using RSYNC in case of RRDP failure), makes life easier for RPKI Repository operators: it should be possible for them to temporarily disable the RRDP webserver and expect clients to use RSYNC instead. Knowing that clients will gracefully fall back to RSYNC lowers the barrier to deploy RRDP. Once all RP implementations have embraced the 'prefer RRDP' strategy, and those implemenations have trickled down into the hands of network operators and deployed in the field, Repository Operators will observe less and less clients connecting to the RSYNC service and more and more syncing via RRDP, to the point where it becomes self-evident publication via RSYNC can maybe even be decommissioned all together. TL;DR - general availability of software which prefers RRDP over RSYNC, combined with patience, should be sufficient of a plan to migrate! :) Current status -- Current versions of OpenBSD rpki-client supports RSYNC. The team is actively working to also support RRDP. The hope is to release a stable version later this year. OpenBSD supports releases for one year, thus any deprecation of RSYNC services should be post-poned at least until Spring 2023 to avoid disenfranchising existing deployments in the field. The RIPE NCC Validator RRDP implementation is broken. It is trivial for any RRDP Repository Operator to remotely crash the entire RIPE NCC validator process. Luckily the software is almost End-Of-Life and soon won't be relevant anymore. Current versions of Routinator are unable to fall back to RSYNC. However in November 2020, the team indicated they would fix this in the next release (which has
Re: [routing-wg] Issue affecting rsync RPKI repository fetching
On Mon, Apr 12, 2021 at 02:12:10PM +0100, Nick Hilliard wrote: > Erik Bais wrote on 12/04/2021 11:41: > > This looks to be a 3 line bash script fix on a cronjob … So why > > isn’t this just tested on a testbed and updated before the end of > > the week ? > > cache coherency and transaction guarantees are problems which are > known to be difficult to solve. Long term, the RIPE NCC probably > needs to aim for the degree of transaction read consistency that might > be provided by an ACID database or something equivalent, and that will > take time and probably a migration away from a filesystem-based data > store. > > So the question is what to do in the interim? The bash script that > Job posted may help to reduce some of the race conditions that are > being observed, but it's unlikely to guarantee transaction consistency > in a deep sense. Does the RIPE NCC have an opinion on whether the > approach used in this script would help with some of the problems that > are being seen and if so, would there be any significant negative > effects associated with implementing it as an intermediate patch-up? Perhaps the script [0] can be of use, or perhaps not. The script assumes a POSIXish-compliant environment. It is not clear to me what software process runs where and how RIPE NCC runs their publication service. The core problem seems to me that while RSYNC clients are connected the RIPE NCC RPKI server appears to 'pull the rug' from underneath them. This practise reduces the reliability of the RIPE NCC RPKI service. I can only guess how the RIPE NCC RPKI publication service exactly is configured, but I imagine there is a 'Signer Server' which writes to disk the few thousand individual RPKI objects, and separately there is a RSYNC server (rpki.ripe.net) which serves the files to RSYNC clients. Transferring sets of inter-related files around is a 'batch' operation, the pipeline should set up accordingly. As such, calling 'rsync' from crontab to populate the rpki.ripe.net rsync server would likely lead to inconsistent results. There are (at least) two objectives to keep in mind: 1/ While the Signer software is writing new files out to disk, the 'signer to publisher' replication process should not run, because the signer isn't finished yet. 2/ While a given RSYNC client is fetching from 'rpki.ripe.net', the 'signer to publisher' replication process should not alter the contents of the filesystem hierarchy the RSYNC client is fetching from. The satisfy the above two conditions, I suspect a number of solutions are available: A) take ownership and control and only launch subsequent pipeline steps when the Signer is done signing the latest requests. After a consistent set of files has been written to disk, only then copy, stage, and switch to the new directory contents using a symlink swap (allowing already connected RSYNC clients to complete their fetch). B) Use a load balancer to direct new RSYNC clients to a RSYNC server containing the latest (consistent) set of files. C) Make the RSYNC service pull from the latest (allegedly consistent) RRDP snapshot.xml file, then move newly connected clients to the new content using either the symlink [0] trick or a orchestrate draining/onramping via a load balancer like haproxy. There is a wealth of knowledge available in this working group on how POSIX-like systems work, how ISP operations work, and the RPKI works, I hope RIPE NCC can leverage that. Kind regards, Job [0]: http://sobornost.net/~job/rpki-rsync-move.sh.txt
Re: [routing-wg] RPKI Invalid == Reject policies on the AS 3333 EBGP border
Dear W. Boot, On Thu, Apr 01, 2021 at 12:38:27PM +0200, W. Boot wrote: > Would "invalid" also include unsigned space? No. By definition, unsigned space can never ever be "RPKI invalid". In order for any BGP route to be marked as "RPKI invalid", a RPKI ROA _MUST_ exist. Without covering ROAs, BGP routes cannot be "RPKI invalid". > If it does, that might lead to legacy space or networks getting space > through certain NIRs to be accidentally being blocked by whomever > relying on this, unless these blocks can be exempt from inclusion? Luckily it doesn't! :-) Operators who use RPKI to perform BGP Route Origin Validation, do so to to detect & reject invalid routes. As mentioned above, BGP routes can only be recognized as 'invalid' if and only if a covering ROA exists. Complete and simple configuration examples can be found here: http://bgpfilterguide.nlnog.net/guides/reject_invalids/ By exclusively focussing on "RPKI invalid" BGP routes, RPKI ROV is incrementally deployable. Incremental deployability is a key factor. Kind regards, Job
Re: [routing-wg] Call for Presentations - RIPE 82
Hi all, The expectation is that we can watch material in the way it was intended, and have the presenter around for live Q and A / discussion. Presenters can even answer questions while the information is being distributed, which I find to add a new level of interaction previously not possible! If a presenter doesn’t want to prerecord (preference for living on the edge and doing it “live!”) that is fine too. We won’t force anyone to present live, however we do expect folks to be around to cover the interactive element (which indeed is important). I’m not suggesting the meetings are turned into linear television, that indeed would not be adequate, rather that the formats of NANOG/NLNOG/Django is followed. Any presentation proposals on the topic of Routing are welcome, and whatever delivery format the presenter envisions as the best method to share their knowledge will be considered and facilitated with what resources we can muster. Proposals can be send to routing-wg-cha...@ripe.net Kind regards, Job
[routing-wg] Call for Presentations - RIPE 82
Dear RIPE Routing WG, This is a call for presentation proposals for RIPE 82. The RIPE 82 meeting takes place in about 8 weeks: https://ripe82.ripe.net/ We ask the Working Group and RIPE NCC for presentation proposals for the illustrious 1 hour Routing WG slot on Thursday, May 20th 2021. When you submit a proposal please also include slides for the chairs to review. If you've presented similar material elsewhere please share with us when and where. We'll ask presenters to pre-recorded their talk to try to prevent local, logistcal, and/or routing problems from impacting the meeting. :-) Kind regards, Job, Paul, Ignas Routing Working Group Chairs
Re: [routing-wg] RPKI Route Origin Validation and AS3333
Dear RIPE NCC, On Thu, Mar 18, 2021 at 04:03:16PM +0100, Nathalie Trenaman wrote: > From the network operations perspective, there are no obstacles to > enable ROV on AS Excellent news! > however, we have to consider that members or End Users who announce > something different in BGP than their ROA claims, will be dropped and > lose access to our services from their network. Another scenario where a member can't reach RIPE NCC is when the Member's network is not connected directly or indirectly to RIPE NCC's network. There are many many scenarios in which this can happen. Imagine RIPE NCC purchases IP transit from Transit_X, and the member purchases IP transit from Transit_Y, but Transit_X and Transit_Y for one reason or another don't peer with each other. In such a network topology there no exchange of IP traffic is possible between RIPE NCC and the Member. The Internet is a 'mostly' connected graph of nodes, the default-free zone is always in flux. Not everyone can reach everyone all the time. Sometimes an operator has to walk to the local teahouse or jump on the wifi network of their neighbor to help fix the connectivity issue. There never is ANY guarantee all Members or End Users can exchange IP traffic with RIPE NCC servers at all times. For this specific reason I applaude the fact that the RIPE NCC 'member sign-up process' can be executed online and ALSO via postal service. End-to-end Internet connectivity is not a requirement to do business with RIPE NCC. > From an analysis we made on 10 February, there were 511 of such > announcements from our members and End Users. quick side-note: Did your team check how many of those route announcements are covered by less-specific 'valid' or 'not-found' route announcements? or even by a default route? To me or this group the answer is not that relevant, but I raise this comment to point out that what matters most in service delivery is the end-to-end data-plane connectivity, and rejecting a few RPKI invalid routes in and of itself doesn't necessarily lead to loss of connectivity. > Our current RPKI Terms and Conditions do not mention that a Member or > End User ROA should match their routing intentions, or any > implications it may have if the ROA does not match their BGP > announcement. And indeed, the RPKI terms and conditions SHOULD NOT mention anything of such nature. As Relying Parties we can never know what people actually intended to publish in the RPKI. All any Relying Party knows is that the holder of the private keys of a CA with a set of subordinate resources managed to produce a cryptographicly valid object validating according the RPKI CP (RFC 6484) and there is a valid chain towards the locally present Trust Anchor Locator. It is always laudable to try to stop children from running around with scissors, but RIPE NCC can't really stop operators from hurting their network presence. The best RIPE NCC can do is to try to design good User Interfaces, and provide accurate documentation. > If the community decides it is important that AS performs ROV, our > legal team needs to update the RPKI Terms and Conditions to reflect > the potential impact. I challenge the above assertion as I do not believe the legal team has to update anything. The RIPE NCC network is connected to the Internet as 'best effort'. Whether a specific individual IP packets originating from RIPE NCC's servers arrive at the the final destination or not is not relevant on a case-by-case basis. An IP packet might be dropped because of ethernet port congestion, a routing partitioning gap in the BGP DFZ because of a peering dispute, a submarine cable cut, a software defect, a member misconfiguring a RPKI ROA, a local wifi problem, or any other reason... it doesn't matter. All we hope for is that when Internet outages occur, someone somewhere is working on it. :-) Kind regards, Job
[routing-wg] Improving operations at RIPE NCC TA (Was: Delay in publishing RPKI objects)
Dear RIPE NCC, On Wed, Feb 17, 2021 at 11:28:32AM +0100, Nathalie Trenaman wrote: > > The multitude of RPKI service impacting events as a result from > > maloperation of the RIPE NCC trust anchor are starting to give me > > cause for concern. > > I’m sorry to hear this. Transparency is key for us, this means that we > report any event. In this case, we were not compliant with our CPS and > this non-publishing period had operational impact. >From the previous email there might be a misunderstanding about what rpki-client and Routinator do. Both utilities help Relying Parties validate X.509 signed CMS objects and transform the validated content into authorizations and attestations. Neither utility is a SLA or CPS compliance monitor. RIPE NCC - as CA operator - needs different tools. Neither utility has been designed to interpret the Certification Practise Policy (written in a natural language) and subsequently programmatically transform the described 'Service Level' into metrics suitable for monitoring. A relying party can never tell the difference between a publication pipeline being empty because CAs didn't issue new objects, or a publication pipeline being empty because of a malfunction in one of RIPE NCC's RPKI subsystems. More examples of 'out of scope' functionality for Relying Party software: validators don't monitor whether lirportal.ripe.net is functional, whether RIPE NCC's BPKI API endpoints are operational, or whether LIRs paid their invoices, the list is quite long. The validators by themselves are the wrong tool for RPKI CPS/SLA monitoring. You state "transparency is key for us", but I fear ad-hoc low-quality a-posteriori reports are not the appropriate mechanism to impress and reassure this community regarding 'transparency'. I have some tangible suggestions to RIPE NCC that will improve the reliability of the Trust Anchor and potentially help rebuild trust: A need for Certificate Transparency --- RIPE NCC should set up a Certificate Transparency project which publicly shows which certificates (fingerprints) were issued when, and store such information in immutable logs, accessible to all. How can anyone trust a Trust Anchor which does not offer transparency about its issuance process? Lack of transparency to signer software --- The RIPE NCC WHOIS database software is open source, as is most of the software for RIPE Atlas, K-ROOT, and other efforts RIPE NCC has undertaken over the years. Why has the signer source code still not open sourced? Why can't members review the code related to scheduled changes? Why is an organisation proclaiming 'transparency' being opaque about how the RPKI certificates are issued? Lack of Public status dashboard --- RIPE NCC should set up a website like https://rpki-status.ripe.net/ which shows dashboards with graphs and traffic lights related to each (best effort) commitment listed in the CPS. RIPE NCC should continuously publish & revoke & delete objects and verify whether those activities are visible externally, and then automatically report whether any potential delays observed are within the Service Levels outlined in the CPS. Metrics that come to mind: * delta between last certificate issuance & successful publication * Object count in the repository, repo size (and graphs) * Time-To-Keyroll (and graphs on duration & frequency) * Resource utilisation of various RPKI subsystems * aggregate bandwidth consumption for RPKI endpoints (including rrdp, API, rsync) * Graphs & logs of overlap between INRs listed on EE certificates under the RIPE TA and other commonly used TAs, matched against known transfers. This will help detect compromises as well as understand whether transfers are successful or not. * Unique client IP count for RSYNC & RRDP for last hour/day/week * Number of CS/hostmaster tickets mentioning RPKI There is are plenty of aspects to monitor, perhaps some notes should be copied from how the DNS root is monitored. Lack of operational experience with BGP ROV at RIPE NCC --- I believe the number of potential future incidents related to the RIPE NCC Trust Anchor can be prevented (or remediation time reduced) if RIPE NCC themselves apply RPKI based BGP Origin Validation 'invalid == reject' policies on the AS EBGP border routers. RIPE NCC OPS themselves having a dependency on the RPKI services will increase organization-wide exposure to the (lack of) well-being of the Trust Anchor services, and given the short communication channels between the OPS team and the RPKI team my expectation is that we'll see problems being solved faster and perhaps even problems being prevented. An analogy: RIPE NCC is a kitchenchef refusing to eat their own food. How can we trust RIPE NCC to operate RPKI services, when RIPE NCC themselves refuses to apply the cryptographic products to their BGP
Re: [routing-wg] Delay in publishing RPKI objects
Dear RIPE NCC, On Tue, Feb 16, 2021 at 04:56:31PM +0100, Nathalie Trenaman wrote: > On Monday, 15 February we encountered an issue with our RPKI software. > This issue prevented us from publishing RPKI object updates from > 08:07-18:06 (UTC). > > During this period, Certificate Authority activation and Route Origin > Authorization configuration updates were delayed and therefore not > visible in the RPKI repository. It appears Certificate Authority revocation was also delayed. > The updates were published after we restarted the system at 17:45 > (UTC), with full recovery completed by 18:06 (UTC). Since this > non-publishing period is shorter than our default RPKI object validity > period, set to 8 hours, existing objects that are not updated were > still valid. No data was lost during this period. Can the following phrase "default RPKI object validty period, set to 8 hours" please be clarified? For objects produced in the RIPE-hosted RPKI environment I observe the following validity periods are commonly used: Object type| validity duration after issuance ---+- CRLs | 24 hours ROA EE certs | 18 months Manifest eContent | 24 hours Manifest EE certs | 7 days CAs| 18 months I'm just guessing, is the '8 hour' period a reference to RIPE-751 section 2.3? "A certificate will be published within eight hours of being issued (or deleted)." The RIPE-751 CPS also states in section 4.9.8 ("Maximum latency for CRLs"): CRLs will be published to the repository system within one hour of their generation. As the outage appears to have exceeded both the 1 hour revocation window and 8 hour object publication window, RIPE NCC was not compliant with its own CPS. The multitude of RPKI service impacting events as a result from maloperation of the RIPE NCC trust anchor are starting to give me me cause for concern. Kind regards, Job