Re: [dns-operations] anchors.atlas.ripe.net/ripe.net - DNSSEC bogus due expiration
Am Mon, Nov 06, 2023 at 08:37:12AM +0100 schrieb Stefan Ubbink via dns-operations: > > There could be a new "rdnc" protocol verb that asks the nameserver > > for a list of all the zones where the soonest expiration time is > > below some threshold, or askes about a particular zone. > > This would still be based on polling the name server, and I think > active signalling would be better. There is a IETF draft [1] which > writes something about sending a signal when signatures are (about to) > expire. FYI: maybe the simplistic approach below might be nice for some operators. I like it because is independent from the actual namserver software. The operator could simply grep for RRSIGs of all zones on the nameserver. This quick and dirty approach gives me a list of >140.000 RRSIGs of about 7000 zones: $ grep -A1 RRSIG /var/named//* | awk '{print $2" "$1}' | grep ^20 Than just pipe the output to a simple script (e.g perl) and compare the first column with the output of "date -d+5days +%Y%m%d%H%M00" and you quick and nicely checked that all RRSIGs are valid for at least 5 days. quick and dirty perl: #!/usr/bin/perl $date = `date -d+5days +%Y%m%d%H%M00`; chomp $date; while(<>) { chomp; ($a,$file) = split(/ /, $_); if ($a > $date) { next; } if ($seen{$file} eq 1) { next; } $seen{$file} = 1; print "rrsig with lifetime <5 days: $file ($a)\n"; } If it's preferred to run such operations on a workstation/monitoring station, one could AXFR the zones using dig and check the RRSIGs there. Andreas ___ dns-operations mailing list dns-operations@lists.dns-oarc.net https://lists.dns-oarc.net/mailman/listinfo/dns-operations
Re: [dns-operations] anchors.atlas.ripe.net/ripe.net - DNSSEC bogus due expiration
--- Begin Message --- On Thu, 2 Nov 2023 11:18:34 -0400 Viktor Dukhovni wrote: > On Thu, Nov 02, 2023 at 09:34:17AM +0100, Stephane Bortzmeyer wrote: > > > > Specifically, in the case of signed zones, monitoring MUST also > > > include regular checks of the remaining expiration time of at > > > least the core zone apex records (DNSKEY, SOA and NS), and > > > ideally the whole zone, both on the primary server and the > > > secondaries. > > > > Indeed. If you use Nagios or compatible (such as Icinga), I > > recommend this plugin for signatures monitoring: > > > > http://dns.measurement-factory.com/tools/nagios-plugins/check_zone_rrsig_expiration.html > > > > I wonder whether the widely authoritative resolvers could do more to > to help? > > For example, BIND loads zone data into memory. It should be able to > know the time of the soonest signature expiration for a zone, or at > least (if not loaing the whole zone into memory) the soonest > expiration time is of recently queried records. > > There could be a new "rdnc" protocol verb that asks the nameserver > for a list of all the zones where the soonest expiration time is > below some threshold, or askes about a particular zone. This would still be based on polling the name server, and I think active signalling would be better. There is a IETF draft [1] which writes something about sending a signal when signatures are (about to) expire. [1] https://datatracker.ietf.org/doc/draft-grubto-dnsop-dns-out-of-protocol-signalling/ -- Stefan Ubbink DNS & Systems Engineer Present: Mon, Tue, Wed, Fri SIDN | Meander 501 | 6825 MD | ARNHEM | The Netherlands T +31 (0)26 352 55 00 https://www.sidn.nl pgpZBxxzf3Flc.pgp Description: OpenPGP digital signature --- End Message --- ___ dns-operations mailing list dns-operations@lists.dns-oarc.net https://lists.dns-oarc.net/mailman/listinfo/dns-operations
Re: [dns-operations] anchors.atlas.ripe.net/ripe.net - DNSSEC bogus due expiration
It'd still be good to have that exposed as a metric, since: * that way you don't have to wait to make the mistake (or to find the logs from someone else's mistake) in order to wrap alerting around it * the metric's more or less the metric forever-ish, while it seems more likely that a well-intentioned phrasing change in one of the logs could screw up whatever pattern's being used to match it * I personally think that the metric is somehow more in my face than the logs (e.g., "oh look, I dumped the metrics with a curl/wget and that looks very much like a counter we need to wrap something around" ) * for those living in the Prometheus/Grafana/Loki ecosystem, it may be a bit easier to just run a copy of the BIND exporter (https://github.com/prometheus-community/bind_exporter) than to make sure that all the logs are getting scraped appropriately and the path to get them into Loki works and keeps working all the time -- it being easier to generate a no-data alert for a metric than it is to say "this log message we never get, we still haven't gotten it" And yes, I recognize that "well, Steve, the code's right over here, go to it" is a valid argument. -Steve On 11/3/2023 6:09 AM, Vladimír Čunát via dns-operations wrote: My understanding is that in this case the signer was producing loud syslog warnings immediately when the issue happened (i.e. long before validation could fail). ___ dns-operations mailing list dns-operations@lists.dns-oarc.net https://lists.dns-oarc.net/mailman/listinfo/dns-operations
Re: [dns-operations] anchors.atlas.ripe.net/ripe.net - DNSSEC bogus due expiration
On Fri, Nov 03, 2023 at 11:09:02AM +0100, Vladimír Čunát via dns-operations wrote: > On 01/11/2023 17.18, Viktor Dukhovni wrote: > > Should authoritative [nameservers] have knobs to perform internal checks on > > the signed zones they serve and at least syslog loud warnings? > > My understanding is that in this case the signer was producing loud syslog > warnings immediately when the issue happened (i.e. long before validation > could fail). Sure, but the warnings were far from a clear indication that resigning of the entire zone has stopped. In any case, logging isn't exactly the best interface for realtime monitoring. I do think that exposing the next expiration time for monitoring and likewise a list of zones where that time is too soon would be of value to operators. It doesn't obviate the need for active query probes, those should still also happen, but I do think that operators would benefit from such a (new) signal. -- Viktor. ___ dns-operations mailing list dns-operations@lists.dns-oarc.net https://lists.dns-oarc.net/mailman/listinfo/dns-operations
Re: [dns-operations] anchors.atlas.ripe.net/ripe.net - DNSSEC bogus due expiration
--- Begin Message --- On 01/11/2023 17.18, Viktor Dukhovni wrote: Should authoritative resolvers have knobs to perform internal checks on the signed zones they serve and at least syslog loud warnings? My understanding is that in this case the signer was producing loud syslog warnings immediately when the issue happened (i.e. long before validation could fail). --- End Message --- ___ dns-operations mailing list dns-operations@lists.dns-oarc.net https://lists.dns-oarc.net/mailman/listinfo/dns-operations
Re: [dns-operations] anchors.atlas.ripe.net/ripe.net - DNSSEC bogus due expiration
I liked Viktor’s idea. It would be cool if time-to-re-sign and time-to-signature-expiration were available on the json/xml stats port. (Or are they and I missed it? The last time I used the json/xml stuff, I wasn’t getting metrics for signed zones, just the usual counters and the time-to-expire for secondaries…) -Steve > On Nov 3, 2023, at 1:43 AM, Mark Andrews wrote: > > > >> On 3 Nov 2023, at 02:18, Viktor Dukhovni wrote: >> >> On Thu, Nov 02, 2023 at 09:34:17AM +0100, Stephane Bortzmeyer wrote: >> Specifically, in the case of signed zones, monitoring MUST also include regular checks of the remaining expiration time of at least the core zone apex records (DNSKEY, SOA and NS), and ideally the whole zone, both on the primary server and the secondaries. >>> >>> Indeed. If you use Nagios or compatible (such as Icinga), I recommend >>> this plugin for signatures monitoring: >>> >>> http://dns.measurement-factory.com/tools/nagios-plugins/check_zone_rrsig_expiration.html >> >> I wonder whether the widely authoritative resolvers could do more to >> to help? >> >> For example, BIND loads zone data into memory. It should be able to >> know the time of the soonest signature expiration for a zone, or at >> least (if not loaing the whole zone into memory) the soonest expiration >> time is of recently queried records. > > When you let named perform the signing it does just that. The RRSIGs are > in a heap. We look at the earliest expiration and figure out when it is > due to be re-signed (could be in the past if the server was offline for a > while). We set a timer. When that timer expires we re-sign that RRset plus > several more along with an updated SOA record re-adding them to the heap. > We set a timer for the next batch. If the primary has been down too long > and they have all expired the entire zone will be signed this way when the > primary starts up. > >> There could be a new "rdnc" protocol verb that asks the nameserver for a >> list of all the zones where the soonest expiration time is below some >> threshold, or askes about a particular zone. >> >> Of course in that case the monitoring agent would be a in a "privileged" >> position to query the nameserver's internal control plane, rather than >> having to send queries through "the front door". >> >> Both kinds of monitoring are likely important, but more visibility via >> the control plane may be able to offer a precise/timely view. >> >> - Check each nameserver's control plane. >> - Check as much of the zone as possible. >> - Check each nameserver VIP over each supported protocol >> (UDP, TCP, DoT, DoQ, ...) >> - From multiple vantage points if possible/applicable when >> service is on anycast IPs. >> >> Perhaps through OARC support development of monitoring plugins that >> many operators can use? >> >> If after all the past incidents minor and not so minor operators >> still aren't doing adequate monitoring, perhaps we (the software >> and standards) developers and haven't given them adequate tools? >> >> -- >> Viktor. >> ___ >> dns-operations mailing list >> dns-operations@lists.dns-oarc.net >> https://lists.dns-oarc.net/mailman/listinfo/dns-operations > > -- > Mark Andrews, ISC > 1 Seymour St., Dundas Valley, NSW 2117, Australia > PHONE: +61 2 9871 4742 INTERNET: ma...@isc.org > > > ___ > dns-operations mailing list > dns-operations@lists.dns-oarc.net > https://lists.dns-oarc.net/mailman/listinfo/dns-operations ___ dns-operations mailing list dns-operations@lists.dns-oarc.net https://lists.dns-oarc.net/mailman/listinfo/dns-operations
Re: [dns-operations] anchors.atlas.ripe.net/ripe.net - DNSSEC bogus due expiration
> On 3 Nov 2023, at 02:18, Viktor Dukhovni wrote: > > On Thu, Nov 02, 2023 at 09:34:17AM +0100, Stephane Bortzmeyer wrote: > >>> Specifically, in the case of signed zones, monitoring MUST also include >>> regular checks of the remaining expiration time of at least the core >>> zone apex records (DNSKEY, SOA and NS), and ideally the whole zone, both >>> on the primary server and the secondaries. >> >> Indeed. If you use Nagios or compatible (such as Icinga), I recommend >> this plugin for signatures monitoring: >> >> http://dns.measurement-factory.com/tools/nagios-plugins/check_zone_rrsig_expiration.html > > I wonder whether the widely authoritative resolvers could do more to > to help? > > For example, BIND loads zone data into memory. It should be able to > know the time of the soonest signature expiration for a zone, or at > least (if not loaing the whole zone into memory) the soonest expiration > time is of recently queried records. When you let named perform the signing it does just that. The RRSIGs are in a heap. We look at the earliest expiration and figure out when it is due to be re-signed (could be in the past if the server was offline for a while). We set a timer. When that timer expires we re-sign that RRset plus several more along with an updated SOA record re-adding them to the heap. We set a timer for the next batch. If the primary has been down too long and they have all expired the entire zone will be signed this way when the primary starts up. > There could be a new "rdnc" protocol verb that asks the nameserver for a > list of all the zones where the soonest expiration time is below some > threshold, or askes about a particular zone. > > Of course in that case the monitoring agent would be a in a "privileged" > position to query the nameserver's internal control plane, rather than > having to send queries through "the front door". > > Both kinds of monitoring are likely important, but more visibility via > the control plane may be able to offer a precise/timely view. > >- Check each nameserver's control plane. >- Check as much of the zone as possible. >- Check each nameserver VIP over each supported protocol > (UDP, TCP, DoT, DoQ, ...) >- From multiple vantage points if possible/applicable when > service is on anycast IPs. > > Perhaps through OARC support development of monitoring plugins that > many operators can use? > > If after all the past incidents minor and not so minor operators > still aren't doing adequate monitoring, perhaps we (the software > and standards) developers and haven't given them adequate tools? > > -- >Viktor. > ___ > dns-operations mailing list > dns-operations@lists.dns-oarc.net > https://lists.dns-oarc.net/mailman/listinfo/dns-operations -- Mark Andrews, ISC 1 Seymour St., Dundas Valley, NSW 2117, Australia PHONE: +61 2 9871 4742 INTERNET: ma...@isc.org ___ dns-operations mailing list dns-operations@lists.dns-oarc.net https://lists.dns-oarc.net/mailman/listinfo/dns-operations
Re: [dns-operations] anchors.atlas.ripe.net/ripe.net - DNSSEC bogus due expiration
On Thu, Nov 02, 2023 at 09:34:17AM +0100, Stephane Bortzmeyer wrote: > > Specifically, in the case of signed zones, monitoring MUST also include > > regular checks of the remaining expiration time of at least the core > > zone apex records (DNSKEY, SOA and NS), and ideally the whole zone, both > > on the primary server and the secondaries. > > Indeed. If you use Nagios or compatible (such as Icinga), I recommend > this plugin for signatures monitoring: > > http://dns.measurement-factory.com/tools/nagios-plugins/check_zone_rrsig_expiration.html I wonder whether the widely authoritative resolvers could do more to to help? For example, BIND loads zone data into memory. It should be able to know the time of the soonest signature expiration for a zone, or at least (if not loaing the whole zone into memory) the soonest expiration time is of recently queried records. There could be a new "rdnc" protocol verb that asks the nameserver for a list of all the zones where the soonest expiration time is below some threshold, or askes about a particular zone. Of course in that case the monitoring agent would be a in a "privileged" position to query the nameserver's internal control plane, rather than having to send queries through "the front door". Both kinds of monitoring are likely important, but more visibility via the control plane may be able to offer a precise/timely view. - Check each nameserver's control plane. - Check as much of the zone as possible. - Check each nameserver VIP over each supported protocol (UDP, TCP, DoT, DoQ, ...) - From multiple vantage points if possible/applicable when service is on anycast IPs. Perhaps through OARC support development of monitoring plugins that many operators can use? If after all the past incidents minor and not so minor operators still aren't doing adequate monitoring, perhaps we (the software and standards) developers and haven't given them adequate tools? -- Viktor. ___ dns-operations mailing list dns-operations@lists.dns-oarc.net https://lists.dns-oarc.net/mailman/listinfo/dns-operations
Re: [dns-operations] anchors.atlas.ripe.net/ripe.net - DNSSEC bogus due expiration
On Wed, Nov 01, 2023 at 12:18:42PM -0400, Viktor Dukhovni wrote a message of 67 lines which said: > Specifically, in the case of signed zones, monitoring MUST also include > regular checks of the remaining expiration time of at least the core > zone apex records (DNSKEY, SOA and NS), and ideally the whole zone, both > on the primary server and the secondaries. Indeed. If you use Nagios or compatible (such as Icinga), I recommend this plugin for signatures monitoring: http://dns.measurement-factory.com/tools/nagios-plugins/check_zone_rrsig_expiration.html (If you use Debian, it is in the package monitoring-plugins-contrib.) ___ dns-operations mailing list dns-operations@lists.dns-oarc.net https://lists.dns-oarc.net/mailman/listinfo/dns-operations
Re: [dns-operations] anchors.atlas.ripe.net/ripe.net - DNSSEC bogus due expiration
On Wed, Nov 01, 2023 at 04:49:01PM +0100, Mark Andrews wrote: > It shouldn’t take any time as the bogus records shouldn’t have been cached. > Right, unlike mismatched parent-side DS RRs, RRSIG expiration heals fairly promptly once the zone is resigned at the origin. I am repeatedly surprised when I hear of operators finding out about RRSIG expiration after the fact from 3rd parties. Somehow the reflexive knowlege that DNS monitoring means not only: - Is it still working at this very moment but also: - Is it about to stop working if nothing is done soon appears to not have become an ingrained part of the operator culture. * What can we as a community do to get the message out? * What tooling improvements could make this easier for operators? Specifically, in the case of signed zones, monitoring MUST also include regular checks of the remaining expiration time of at least the core zone apex records (DNSKEY, SOA and NS), and ideally the whole zone, both on the primary server and the secondaries. There needs to be a minimum acceptable remaining RRSIG time that's some reasonable fraction of the total RRSIG lifetime, which if crossed leaves enough time for the responsible operator to react and rectify any issues. My tiny zones are monitored to not go below ~π days of remaining RRSIG validity. :-) ldns-verify-zone -e P0Y0M3DT3H23M54S -V1 ... [ Of course that minimum time needs to be less than the threshold at which extant records are normally resigned. ] Should authoritative resolvers have knobs to perform internal checks on the signed zones they serve and at least syslog loud warnings? If there were some protocol to get a message into a monitoring system, that would be even better... Ideally, if operators cannot or do not on their own implement the requisite monitoring, is it possible to make it easy enough for them to do, and is sufficiently prominently documented or otherwise becomes well known, that they start doing it? "Unmonitored critical service", especially when it involves security, should be an oxymoron. -- Viktor. ___ dns-operations mailing list dns-operations@lists.dns-oarc.net https://lists.dns-oarc.net/mailman/listinfo/dns-operations
Re: [dns-operations] anchors.atlas.ripe.net/ripe.net - DNSSEC bogus due expiration
It shouldn’t take any time as the bogus records shouldn’t have been cached. -- Mark Andrews > On 1 Nov 2023, at 15:06, Paul de Weerd wrote: > > Dear Matthew, > >> On 01/11/2023 12:13, Matthew Richardson via dns-operations wrote: >> Our systems use some RIPE Atlas anchors for general connectivity >> monitoring. Just now, they all failed. >> If looks as if DNSSEC has expired:- >> https://dnsviz.net/d/anchors.atlas.ripe.net/dnssec/ >> It also looks as if other things in ripe.net may also have expired (eg >> www.ripe.net when looking for a contact to advise of this). > > Indeed, there was an issue with the DNSSEC signatures on the ripe.net zone > expiring earlier today (20231101104448). As Stephane commented, this was > resolved at 12:15 (UTC) on our end, but as usual it may take some time for > the fixed zone to propagate to all caches. > > We are working on a post mortem about this incident and will share that with > the community ASAP. > > For future reference, in case of issues with the ripe.net services, > https://status.ripe.net/ should be the go-to place. Admittedly, with the > ripe.net zone bogus, that was also unavailable - something more to consider > going forward. > > Best regards, > > Paul de Weerd > Manager Global Information Infrastructure team > RIPE NCC > ___ > dns-operations mailing list > dns-operations@lists.dns-oarc.net > https://lists.dns-oarc.net/mailman/listinfo/dns-operations ___ dns-operations mailing list dns-operations@lists.dns-oarc.net https://lists.dns-oarc.net/mailman/listinfo/dns-operations
Re: [dns-operations] anchors.atlas.ripe.net/ripe.net - DNSSEC bogus due expiration
Dear Matthew, On 01/11/2023 12:13, Matthew Richardson via dns-operations wrote: Our systems use some RIPE Atlas anchors for general connectivity monitoring. Just now, they all failed. If looks as if DNSSEC has expired:- https://dnsviz.net/d/anchors.atlas.ripe.net/dnssec/ It also looks as if other things in ripe.net may also have expired (eg www.ripe.net when looking for a contact to advise of this). Indeed, there was an issue with the DNSSEC signatures on the ripe.net zone expiring earlier today (20231101104448). As Stephane commented, this was resolved at 12:15 (UTC) on our end, but as usual it may take some time for the fixed zone to propagate to all caches. We are working on a post mortem about this incident and will share that with the community ASAP. For future reference, in case of issues with the ripe.net services, https://status.ripe.net/ should be the go-to place. Admittedly, with the ripe.net zone bogus, that was also unavailable - something more to consider going forward. Best regards, Paul de Weerd Manager Global Information Infrastructure team RIPE NCC ___ dns-operations mailing list dns-operations@lists.dns-oarc.net https://lists.dns-oarc.net/mailman/listinfo/dns-operations
Re: [dns-operations] anchors.atlas.ripe.net/ripe.net - DNSSEC bogus due expiration
On Wed, Nov 01, 2023 at 01:37:14PM +0100, Stephane Bortzmeyer wrote a message of 17 lines which said: > > If looks as if DNSSEC has expired:- > > It seems it has been repaired around 1215 UTC. https://twitter.com/ripencc/status/1719712189496311986 "Our services have been restored and all services are operational. We believe the root cause of the issue was DNSSEC-related, and we are continuing to monitor the situation. We will soon share a postmortem on our status page: https://status.ripe.net; ___ dns-operations mailing list dns-operations@lists.dns-oarc.net https://lists.dns-oarc.net/mailman/listinfo/dns-operations
Re: [dns-operations] anchors.atlas.ripe.net/ripe.net - DNSSEC bogus due expiration
On Wed, Nov 01, 2023 at 11:13:15AM +, Matthew Richardson via dns-operations wrote a message of 64 lines which said: > If looks as if DNSSEC has expired:- It seems it has been repaired around 1215 UTC. ___ dns-operations mailing list dns-operations@lists.dns-oarc.net https://lists.dns-oarc.net/mailman/listinfo/dns-operations
[dns-operations] anchors.atlas.ripe.net/ripe.net - DNSSEC bogus due expiration
--- Begin Message --- Our systems use some RIPE Atlas anchors for general connectivity monitoring. Just now, they all failed. If looks as if DNSSEC has expired:- https://dnsviz.net/d/anchors.atlas.ripe.net/dnssec/ It also looks as if other things in ripe.net may also have expired (eg www.ripe.net when looking for a contact to advise of this). -- Best wishes, Matthew --- End Message --- ___ dns-operations mailing list dns-operations@lists.dns-oarc.net https://lists.dns-oarc.net/mailman/listinfo/dns-operations