Dear colleagues,

Yesterday, we performed an upgrade on the Security world software on our RPKI 
core servers. The upgrade was finished at approximately 08:45 UTC. We tested 
the upgrade and verified that everything worked before enabling the RPKI 
dashboard again.

At approximately 10:50 UTC, we received an alert from our monitoring that 
showed an error for both our online Hardware Security Modules (HSMs). While we 
immediately started the investigation of this alert, we also decided to 
temporarily stop RPKI Core to keep a consistent state. This also meant that we 
had to temporarily close down the RPKI Dashboard.

At 11:22 UTC we contacted our vendor as we had never seen this behaviour 
before. A consultant from our vendor advised a reboot of the HSMs, which we 
performed at 11:55 UTC. After the reboot, the HSMs got back online and we 
enabled the RPKI Core and RPKI dashboard. It is still unknown whether the 
upgrade was the direct cause of the errors, as the error was very generic. 

While we are working on finding the root cause, we still need to reboot systems 
and HSMs occasionally, which causes unavailability of the RPKI Dashboard for a 
few minutes and it will take a bit longer than usual for objects to get 
published in our repository. As soon as we have more information, we will share 
it here.

As a result of this outage, we will speed up the process to replace the online 
HSMs, which we described in our recent RIPE Labs article 
<https://labs.ripe.net/author/ties/securing-the-ripe-ncc-trust-anchor/> [0].

Kind regards,
Stella Vouteva

[0]: https://labs.ripe.net/author/ties/securing-the-ripe-ncc-trust-anchor/

-- 

To unsubscribe from this mailing list, get a password reminder, or change your 
subscription options, please visit: 
https://lists.ripe.net/mailman/listinfo/routing-wg

Reply via email to