Dear colleagues,

Yesterday, 19 September 2022, between 14:36 and 15:08 clients may have observed 
an
inconsistent state of rrdp.ripe.net. After becoming aware of a potential issue, 
we
mitigated the problem by switching to our secondary RRDP infrastructure before 
starting
our investigation.

During the incident, the RRDP repository https://rrdp.ripe.net/notification.xml 
referred
to files that were not available. Out of over 14.5k requests (excluding traffic 
after
failover), 5289 requests resulted in a 404 response. According to our logs and 
external
monitoring, the rsync repository rsync://rpki.ripe.net was fully available 
during this
period, although the service may have been slower than usual due to a high IO 
load.

The trigger of this incident was an automatic failover to a backup publication 
server,
where the primary server recovered in a matter of seconds, which resulted in an 
automatic
switch back. From our logs, we observed that HTTP Keep-Alive kept the CDN 
connected to the
backup server for notification.xml and new connections for snapshot and delta 
files were
made to the primary server.

We recognize that this is a similar outage report as the one from 12 September 
[1], and
will continue with fixing these race conditions in our infrastructure.

[1] - 
https://www.ripe.net/ripe/mail/archives/routing-wg/2022-September/004606.html

Kind regards,

Bart Bakker
Senior Software Engineer
RIPE NCC


-- 

To unsubscribe from this mailing list, get a password reminder, or change your 
subscription options, please visit: 
https://lists.ripe.net/mailman/listinfo/routing-wg

Reply via email to