Re: [dns-operations] anchors.atlas.ripe.net/ripe.net - DNSSEC bogus due expiration

2023-11-06 Thread Andreas S. Kerber
Am Mon, Nov 06, 2023 at 08:37:12AM +0100 schrieb Stefan Ubbink via 
dns-operations:
> > There could be a new "rdnc" protocol verb that asks the nameserver
> > for a list of all the zones where the soonest expiration time is
> > below some threshold, or askes about a particular zone.
> 
> This would still be based on polling the name server, and I think
> active signalling would be better. There is a IETF draft [1] which
> writes something about sending a signal when signatures are (about to)
> expire.

FYI: maybe the simplistic approach below might be nice for some operators.
I like it because is independent from the actual namserver software.

The operator could simply grep for RRSIGs of all zones on the nameserver.
This quick and dirty approach gives me a list of >140.000 RRSIGs of about 7000 
zones:

$ grep -A1 RRSIG /var/named//* | awk '{print $2" "$1}' 
| grep ^20

Than just pipe the output to a simple script (e.g perl) and compare the first 
column
with the output of "date -d+5days +%Y%m%d%H%M00" and you quick and nicely 
checked
that all RRSIGs are valid for at least 5 days. 

quick and dirty perl:

#!/usr/bin/perl

$date = `date -d+5days +%Y%m%d%H%M00`;
chomp $date;

while(<>) {
   chomp;
   ($a,$file) = split(/ /, $_);
   if ($a > $date) { next; }
   if ($seen{$file} eq 1) { next; }
   $seen{$file} = 1;
   print "rrsig with lifetime <5 days: $file ($a)\n";
}

If it's preferred to run such operations on a workstation/monitoring station,
one could AXFR the zones using dig and check the RRSIGs there.

Andreas
___
dns-operations mailing list
dns-operations@lists.dns-oarc.net
https://lists.dns-oarc.net/mailman/listinfo/dns-operations


Re: [dns-operations] anchors.atlas.ripe.net/ripe.net - DNSSEC bogus due expiration

2023-11-05 Thread Stefan Ubbink via dns-operations
--- Begin Message ---
On Thu, 2 Nov 2023 11:18:34 -0400
Viktor Dukhovni  wrote:


> On Thu, Nov 02, 2023 at 09:34:17AM +0100, Stephane Bortzmeyer wrote:
> 
> > > Specifically, in the case of signed zones, monitoring MUST also
> > > include regular checks of the remaining expiration time of at
> > > least the core zone apex records (DNSKEY, SOA and NS), and
> > > ideally the whole zone, both on the primary server and the
> > > secondaries.  
> > 
> > Indeed. If you use Nagios or compatible (such as Icinga), I
> > recommend this plugin for signatures monitoring:
> > 
> > http://dns.measurement-factory.com/tools/nagios-plugins/check_zone_rrsig_expiration.html
> >  
> 
> I wonder whether the widely authoritative resolvers could do more to
> to help?
> 
> For example, BIND loads zone data into memory.  It should be able to
> know the time of the soonest signature expiration for a zone, or at
> least (if not loaing the whole zone into memory) the soonest
> expiration time is of recently queried records.
> 
> There could be a new "rdnc" protocol verb that asks the nameserver
> for a list of all the zones where the soonest expiration time is
> below some threshold, or askes about a particular zone.

This would still be based on polling the name server, and I think
active signalling would be better. There is a IETF draft [1] which
writes something about sending a signal when signatures are (about to)
expire.


[1]
https://datatracker.ietf.org/doc/draft-grubto-dnsop-dns-out-of-protocol-signalling/

-- 
Stefan Ubbink
DNS & Systems Engineer
Present: Mon, Tue, Wed, Fri
SIDN | Meander 501 | 6825 MD | ARNHEM | The Netherlands
T +31 (0)26 352 55 00
https://www.sidn.nl


pgpZBxxzf3Flc.pgp
Description: OpenPGP digital signature
--- End Message ---
___
dns-operations mailing list
dns-operations@lists.dns-oarc.net
https://lists.dns-oarc.net/mailman/listinfo/dns-operations


Re: [dns-operations] anchors.atlas.ripe.net/ripe.net - DNSSEC bogus due expiration

2023-11-03 Thread Steven Miller

It'd still be good to have that exposed as a metric, since:

 * that way you don't have to wait to make the mistake (or to find the
   logs from someone else's mistake) in order to wrap alerting around it
 * the metric's more or less the metric forever-ish, while it seems
   more likely that a well-intentioned phrasing change in one of the
   logs could screw up whatever pattern's being used to match it
 * I personally think that the metric is somehow more in my face than
   the logs (e.g., "oh look, I dumped the metrics with a curl/wget and
   that looks very much like a counter we need to wrap something
   around" )
 * for those living in the Prometheus/Grafana/Loki ecosystem, it may be
   a bit easier to just run a copy of the BIND exporter
   (https://github.com/prometheus-community/bind_exporter) than to make
   sure that all the logs are getting scraped appropriately and the
   path to get them into Loki works and keeps working all the time --
   it being easier to generate a no-data alert for a metric than it is
   to say "this log message we never get, we still haven't gotten it"

And yes, I recognize that "well, Steve, the code's right over here, go 
to it" is a valid argument.


    -Steve

On 11/3/2023 6:09 AM, Vladimír Čunát via dns-operations wrote:


My understanding is that in this case the signer was producing loud 
syslog warnings immediately when the issue happened (i.e. long before 
validation could fail).


___
dns-operations mailing list
dns-operations@lists.dns-oarc.net
https://lists.dns-oarc.net/mailman/listinfo/dns-operations


Re: [dns-operations] anchors.atlas.ripe.net/ripe.net - DNSSEC bogus due expiration

2023-11-03 Thread Viktor Dukhovni
On Fri, Nov 03, 2023 at 11:09:02AM +0100, Vladimír Čunát via dns-operations 
wrote:

> On 01/11/2023 17.18, Viktor Dukhovni wrote:
> > Should authoritative [nameservers] have knobs to perform internal checks on
> > the signed zones they serve and at least syslog loud warnings?
> 
> My understanding is that in this case the signer was producing loud syslog
> warnings immediately when the issue happened (i.e. long before validation
> could fail).

Sure, but the warnings were far from a clear indication that resigning
of the entire zone has stopped.  In any case, logging isn't exactly the
best interface for realtime monitoring.

I do think that exposing the next expiration time for monitoring and
likewise a list of zones where that time is too soon would be of value
to operators.  It doesn't obviate the need for active query probes,
those should still also happen, but I do think that operators would
benefit from such a (new) signal.

-- 
Viktor.

___
dns-operations mailing list
dns-operations@lists.dns-oarc.net
https://lists.dns-oarc.net/mailman/listinfo/dns-operations


Re: [dns-operations] anchors.atlas.ripe.net/ripe.net - DNSSEC bogus due expiration

2023-11-03 Thread Vladimír Čunát via dns-operations
--- Begin Message ---

On 01/11/2023 17.18, Viktor Dukhovni wrote:

Should authoritative resolvers have knobs to perform internal checks on
the signed zones they serve and at least syslog loud warnings?


My understanding is that in this case the signer was producing loud 
syslog warnings immediately when the issue happened (i.e. long before 
validation could fail).


--- End Message ---
___
dns-operations mailing list
dns-operations@lists.dns-oarc.net
https://lists.dns-oarc.net/mailman/listinfo/dns-operations


Re: [dns-operations] anchors.atlas.ripe.net/ripe.net - DNSSEC bogus due expiration

2023-11-03 Thread Steven Miller
I liked Viktor’s idea. It would be cool if time-to-re-sign and 
time-to-signature-expiration were available on the json/xml stats port. (Or are 
they and I missed it?  The last time I used the json/xml stuff, I wasn’t 
getting metrics for signed zones, just the usual counters and the 
time-to-expire for secondaries…)

-Steve

> On Nov 3, 2023, at 1:43 AM, Mark Andrews  wrote:
> 
> 
> 
>> On 3 Nov 2023, at 02:18, Viktor Dukhovni  wrote:
>> 
>> On Thu, Nov 02, 2023 at 09:34:17AM +0100, Stephane Bortzmeyer wrote:
>> 
 Specifically, in the case of signed zones, monitoring MUST also include
 regular checks of the remaining expiration time of at least the core
 zone apex records (DNSKEY, SOA and NS), and ideally the whole zone, both
 on the primary server and the secondaries.
>>> 
>>> Indeed. If you use Nagios or compatible (such as Icinga), I recommend
>>> this plugin for signatures monitoring:
>>> 
>>> http://dns.measurement-factory.com/tools/nagios-plugins/check_zone_rrsig_expiration.html
>> 
>> I wonder whether the widely authoritative resolvers could do more to
>> to help?
>> 
>> For example, BIND loads zone data into memory.  It should be able to
>> know the time of the soonest signature expiration for a zone, or at
>> least (if not loaing the whole zone into memory) the soonest expiration
>> time is of recently queried records.
> 
> When you let named perform the signing it does just that.  The RRSIGs are
> in a heap.  We look at the earliest expiration and figure out when it is
> due to be re-signed (could be in the past if the server was offline for a
> while).  We set a timer.  When that timer expires we re-sign that RRset plus
> several more along with an updated SOA record re-adding them to the heap.
> We set a timer for the next batch.  If the primary has been down too long
> and they have all expired the entire zone will be signed this way when the
> primary starts up.
> 
>> There could be a new "rdnc" protocol verb that asks the nameserver for a
>> list of all the zones where the soonest expiration time is below some
>> threshold, or askes about a particular zone.
>> 
>> Of course in that case the monitoring agent would be a in a "privileged"
>> position to query the nameserver's internal control plane, rather than
>> having to send queries through "the front door".
>> 
>> Both kinds of monitoring are likely important, but more visibility via
>> the control plane may be able to offer a precise/timely view.
>> 
>>   - Check each nameserver's control plane.
>>   - Check as much of the zone as possible.
>>   - Check each nameserver VIP over each supported protocol
>> (UDP, TCP, DoT, DoQ, ...)
>>   - From multiple vantage points if possible/applicable when
>> service is on anycast IPs.
>> 
>> Perhaps through OARC support development of monitoring plugins that
>> many operators can use?
>> 
>> If after all the past incidents minor and not so minor operators
>> still aren't doing adequate monitoring, perhaps we (the software
>> and standards) developers and haven't given them adequate tools?
>> 
>> --
>>   Viktor.
>> ___
>> dns-operations mailing list
>> dns-operations@lists.dns-oarc.net
>> https://lists.dns-oarc.net/mailman/listinfo/dns-operations
> 
> --
> Mark Andrews, ISC
> 1 Seymour St., Dundas Valley, NSW 2117, Australia
> PHONE: +61 2 9871 4742  INTERNET: ma...@isc.org
> 
> 
> ___
> dns-operations mailing list
> dns-operations@lists.dns-oarc.net
> https://lists.dns-oarc.net/mailman/listinfo/dns-operations

___
dns-operations mailing list
dns-operations@lists.dns-oarc.net
https://lists.dns-oarc.net/mailman/listinfo/dns-operations


Re: [dns-operations] anchors.atlas.ripe.net/ripe.net - DNSSEC bogus due expiration

2023-11-02 Thread Mark Andrews



> On 3 Nov 2023, at 02:18, Viktor Dukhovni  wrote:
> 
> On Thu, Nov 02, 2023 at 09:34:17AM +0100, Stephane Bortzmeyer wrote:
> 
>>> Specifically, in the case of signed zones, monitoring MUST also include
>>> regular checks of the remaining expiration time of at least the core
>>> zone apex records (DNSKEY, SOA and NS), and ideally the whole zone, both
>>> on the primary server and the secondaries.
>> 
>> Indeed. If you use Nagios or compatible (such as Icinga), I recommend
>> this plugin for signatures monitoring:
>> 
>> http://dns.measurement-factory.com/tools/nagios-plugins/check_zone_rrsig_expiration.html
> 
> I wonder whether the widely authoritative resolvers could do more to
> to help?
> 
> For example, BIND loads zone data into memory.  It should be able to
> know the time of the soonest signature expiration for a zone, or at
> least (if not loaing the whole zone into memory) the soonest expiration
> time is of recently queried records.

When you let named perform the signing it does just that.  The RRSIGs are
in a heap.  We look at the earliest expiration and figure out when it is
due to be re-signed (could be in the past if the server was offline for a
while).  We set a timer.  When that timer expires we re-sign that RRset plus
several more along with an updated SOA record re-adding them to the heap.
We set a timer for the next batch.  If the primary has been down too long
and they have all expired the entire zone will be signed this way when the
primary starts up. 

> There could be a new "rdnc" protocol verb that asks the nameserver for a
> list of all the zones where the soonest expiration time is below some
> threshold, or askes about a particular zone.
> 
> Of course in that case the monitoring agent would be a in a "privileged"
> position to query the nameserver's internal control plane, rather than
> having to send queries through "the front door".
> 
> Both kinds of monitoring are likely important, but more visibility via
> the control plane may be able to offer a precise/timely view.
> 
>- Check each nameserver's control plane.
>- Check as much of the zone as possible.
>- Check each nameserver VIP over each supported protocol
>  (UDP, TCP, DoT, DoQ, ...)
>- From multiple vantage points if possible/applicable when
>  service is on anycast IPs.
> 
> Perhaps through OARC support development of monitoring plugins that
> many operators can use?
> 
> If after all the past incidents minor and not so minor operators
> still aren't doing adequate monitoring, perhaps we (the software
> and standards) developers and haven't given them adequate tools?
> 
> -- 
>Viktor.
> ___
> dns-operations mailing list
> dns-operations@lists.dns-oarc.net
> https://lists.dns-oarc.net/mailman/listinfo/dns-operations

-- 
Mark Andrews, ISC
1 Seymour St., Dundas Valley, NSW 2117, Australia
PHONE: +61 2 9871 4742  INTERNET: ma...@isc.org


___
dns-operations mailing list
dns-operations@lists.dns-oarc.net
https://lists.dns-oarc.net/mailman/listinfo/dns-operations


Re: [dns-operations] anchors.atlas.ripe.net/ripe.net - DNSSEC bogus due expiration

2023-11-02 Thread Viktor Dukhovni
On Thu, Nov 02, 2023 at 09:34:17AM +0100, Stephane Bortzmeyer wrote:

> > Specifically, in the case of signed zones, monitoring MUST also include
> > regular checks of the remaining expiration time of at least the core
> > zone apex records (DNSKEY, SOA and NS), and ideally the whole zone, both
> > on the primary server and the secondaries.
> 
> Indeed. If you use Nagios or compatible (such as Icinga), I recommend
> this plugin for signatures monitoring:
> 
> http://dns.measurement-factory.com/tools/nagios-plugins/check_zone_rrsig_expiration.html

I wonder whether the widely authoritative resolvers could do more to
to help?

For example, BIND loads zone data into memory.  It should be able to
know the time of the soonest signature expiration for a zone, or at
least (if not loaing the whole zone into memory) the soonest expiration
time is of recently queried records.

There could be a new "rdnc" protocol verb that asks the nameserver for a
list of all the zones where the soonest expiration time is below some
threshold, or askes about a particular zone.

Of course in that case the monitoring agent would be a in a "privileged"
position to query the nameserver's internal control plane, rather than
having to send queries through "the front door".

Both kinds of monitoring are likely important, but more visibility via
the control plane may be able to offer a precise/timely view.

- Check each nameserver's control plane.
- Check as much of the zone as possible.
- Check each nameserver VIP over each supported protocol
  (UDP, TCP, DoT, DoQ, ...)
- From multiple vantage points if possible/applicable when
  service is on anycast IPs.

Perhaps through OARC support development of monitoring plugins that
many operators can use?

If after all the past incidents minor and not so minor operators
still aren't doing adequate monitoring, perhaps we (the software
and standards) developers and haven't given them adequate tools?

-- 
Viktor.
___
dns-operations mailing list
dns-operations@lists.dns-oarc.net
https://lists.dns-oarc.net/mailman/listinfo/dns-operations


Re: [dns-operations] anchors.atlas.ripe.net/ripe.net - DNSSEC bogus due expiration

2023-11-02 Thread Stephane Bortzmeyer
On Wed, Nov 01, 2023 at 12:18:42PM -0400,
 Viktor Dukhovni  wrote 
 a message of 67 lines which said:

> Specifically, in the case of signed zones, monitoring MUST also include
> regular checks of the remaining expiration time of at least the core
> zone apex records (DNSKEY, SOA and NS), and ideally the whole zone, both
> on the primary server and the secondaries.

Indeed. If you use Nagios or compatible (such as Icinga), I recommend
this plugin for signatures monitoring:

http://dns.measurement-factory.com/tools/nagios-plugins/check_zone_rrsig_expiration.html

(If you use Debian, it is in the package monitoring-plugins-contrib.)

___
dns-operations mailing list
dns-operations@lists.dns-oarc.net
https://lists.dns-oarc.net/mailman/listinfo/dns-operations


Re: [dns-operations] anchors.atlas.ripe.net/ripe.net - DNSSEC bogus due expiration

2023-11-01 Thread Viktor Dukhovni
On Wed, Nov 01, 2023 at 04:49:01PM +0100, Mark Andrews wrote:

> It shouldn’t take any time as the bogus records shouldn’t have been cached.
> 

Right, unlike mismatched parent-side DS RRs, RRSIG expiration heals
fairly promptly once the zone is resigned at the origin.

I am repeatedly surprised when I hear of operators finding out about
RRSIG expiration after the fact from 3rd parties.

Somehow the reflexive knowlege that DNS monitoring means not only:

- Is it still working at this very moment

but also:

- Is it about to stop working if nothing is done soon

appears to not have become an ingrained part of the operator culture.

* What can we as a community do to get the message out?
* What tooling improvements could make this easier for operators?

Specifically, in the case of signed zones, monitoring MUST also include
regular checks of the remaining expiration time of at least the core
zone apex records (DNSKEY, SOA and NS), and ideally the whole zone, both
on the primary server and the secondaries.

There needs to be a minimum acceptable remaining RRSIG time that's some
reasonable fraction of the total RRSIG lifetime, which if crossed leaves
enough time for the responsible operator to react and rectify any
issues.  My tiny zones are monitored to not go below ~π days of
remaining RRSIG validity. :-)

ldns-verify-zone -e P0Y0M3DT3H23M54S -V1 ...

[ Of course that minimum time needs to be less than the threshold at which
  extant records are normally resigned. ]

Should authoritative resolvers have knobs to perform internal checks on
the signed zones they serve and at least syslog loud warnings?

If there were some protocol to get a message into a monitoring system,
that would be even better...

Ideally, if operators cannot or do not on their own implement the
requisite monitoring, is it possible to make it easy enough for them to
do, and is sufficiently prominently documented or otherwise becomes well
known, that they start doing it?

"Unmonitored critical service", especially when it involves security,
should be an oxymoron.

-- 
Viktor.

___
dns-operations mailing list
dns-operations@lists.dns-oarc.net
https://lists.dns-oarc.net/mailman/listinfo/dns-operations


Re: [dns-operations] anchors.atlas.ripe.net/ripe.net - DNSSEC bogus due expiration

2023-11-01 Thread Mark Andrews
It shouldn’t take any time as the bogus records shouldn’t have been cached.

-- 
Mark Andrews

> On 1 Nov 2023, at 15:06, Paul de Weerd  wrote:
> 
> Dear Matthew,
> 
>> On 01/11/2023 12:13, Matthew Richardson via dns-operations wrote:
>> Our systems use some RIPE Atlas anchors for general connectivity
>> monitoring.  Just now, they all failed.
>> If looks as if DNSSEC has expired:-
>> https://dnsviz.net/d/anchors.atlas.ripe.net/dnssec/
>> It also looks as if other things in ripe.net may also have expired (eg
>> www.ripe.net  when looking for a contact to advise of this).
> 
> Indeed, there was an issue with the DNSSEC signatures on the ripe.net zone 
> expiring earlier today (20231101104448).  As Stephane commented, this was 
> resolved at 12:15 (UTC) on our end, but as usual it may take some time for 
> the fixed zone to propagate to all caches.
> 
> We are working on a post mortem about this incident and will share that with 
> the community ASAP.
> 
> For future reference, in case of issues with the ripe.net services, 
> https://status.ripe.net/ should be the go-to place.  Admittedly, with the 
> ripe.net zone bogus, that was also unavailable - something more to consider 
> going forward.
> 
> Best regards,
> 
> Paul de Weerd
> Manager Global Information Infrastructure team
> RIPE NCC
> ___
> dns-operations mailing list
> dns-operations@lists.dns-oarc.net
> https://lists.dns-oarc.net/mailman/listinfo/dns-operations

___
dns-operations mailing list
dns-operations@lists.dns-oarc.net
https://lists.dns-oarc.net/mailman/listinfo/dns-operations


Re: [dns-operations] anchors.atlas.ripe.net/ripe.net - DNSSEC bogus due expiration

2023-11-01 Thread Paul de Weerd

Dear Matthew,

On 01/11/2023 12:13, Matthew Richardson via dns-operations wrote:

Our systems use some RIPE Atlas anchors for general connectivity
monitoring.  Just now, they all failed.

If looks as if DNSSEC has expired:-

https://dnsviz.net/d/anchors.atlas.ripe.net/dnssec/

It also looks as if other things in ripe.net may also have expired (eg
www.ripe.net  when looking for a contact to advise of this).


Indeed, there was an issue with the DNSSEC signatures on the ripe.net 
zone expiring earlier today (20231101104448).  As Stephane commented, 
this was resolved at 12:15 (UTC) on our end, but as usual it may take 
some time for the fixed zone to propagate to all caches.


We are working on a post mortem about this incident and will share that 
with the community ASAP.


For future reference, in case of issues with the ripe.net services, 
https://status.ripe.net/ should be the go-to place.  Admittedly, with 
the ripe.net zone bogus, that was also unavailable - something more to 
consider going forward.


Best regards,

Paul de Weerd
Manager Global Information Infrastructure team
RIPE NCC
___
dns-operations mailing list
dns-operations@lists.dns-oarc.net
https://lists.dns-oarc.net/mailman/listinfo/dns-operations


Re: [dns-operations] anchors.atlas.ripe.net/ripe.net - DNSSEC bogus due expiration

2023-11-01 Thread Stephane Bortzmeyer
On Wed, Nov 01, 2023 at 01:37:14PM +0100,
 Stephane Bortzmeyer  wrote 
 a message of 17 lines which said:

> > If looks as if DNSSEC has expired:-
> 
> It seems it has been repaired around 1215 UTC.

https://twitter.com/ripencc/status/1719712189496311986

"Our services have been restored and all services are operational. We
believe the root cause of the issue was DNSSEC-related, and we are
continuing to monitor the situation. We will soon share a postmortem
on our status page: https://status.ripe.net;

___
dns-operations mailing list
dns-operations@lists.dns-oarc.net
https://lists.dns-oarc.net/mailman/listinfo/dns-operations


Re: [dns-operations] anchors.atlas.ripe.net/ripe.net - DNSSEC bogus due expiration

2023-11-01 Thread Stephane Bortzmeyer
On Wed, Nov 01, 2023 at 11:13:15AM +,
 Matthew Richardson via dns-operations  wrote 
 a message of 64 lines which said:

> If looks as if DNSSEC has expired:-

It seems it has been repaired around 1215 UTC.
___
dns-operations mailing list
dns-operations@lists.dns-oarc.net
https://lists.dns-oarc.net/mailman/listinfo/dns-operations


[dns-operations] anchors.atlas.ripe.net/ripe.net - DNSSEC bogus due expiration

2023-11-01 Thread Matthew Richardson via dns-operations
--- Begin Message ---
Our systems use some RIPE Atlas anchors for general connectivity
monitoring.  Just now, they all failed.

If looks as if DNSSEC has expired:-

https://dnsviz.net/d/anchors.atlas.ripe.net/dnssec/

It also looks as if other things in ripe.net may also have expired (eg
www.ripe.net when looking for a contact to advise of this).

--
Best wishes,
Matthew
--- End Message ---
___
dns-operations mailing list
dns-operations@lists.dns-oarc.net
https://lists.dns-oarc.net/mailman/listinfo/dns-operations