Re: [dns-operations] anchors.atlas.ripe.net/ripe.net - DNSSEC bogus due expiration

2023-11-03 Thread Steven Miller

It'd still be good to have that exposed as a metric, since:

 * that way you don't have to wait to make the mistake (or to find the
   logs from someone else's mistake) in order to wrap alerting around it
 * the metric's more or less the metric forever-ish, while it seems
   more likely that a well-intentioned phrasing change in one of the
   logs could screw up whatever pattern's being used to match it
 * I personally think that the metric is somehow more in my face than
   the logs (e.g., "oh look, I dumped the metrics with a curl/wget and
   that looks very much like a counter we need to wrap something
   around" )
 * for those living in the Prometheus/Grafana/Loki ecosystem, it may be
   a bit easier to just run a copy of the BIND exporter
   (https://github.com/prometheus-community/bind_exporter) than to make
   sure that all the logs are getting scraped appropriately and the
   path to get them into Loki works and keeps working all the time --
   it being easier to generate a no-data alert for a metric than it is
   to say "this log message we never get, we still haven't gotten it"

And yes, I recognize that "well, Steve, the code's right over here, go 
to it" is a valid argument.


    -Steve

On 11/3/2023 6:09 AM, Vladimír Čunát via dns-operations wrote:


My understanding is that in this case the signer was producing loud 
syslog warnings immediately when the issue happened (i.e. long before 
validation could fail).


___
dns-operations mailing list
dns-operations@lists.dns-oarc.net
https://lists.dns-oarc.net/mailman/listinfo/dns-operations


Re: [dns-operations] anchors.atlas.ripe.net/ripe.net - DNSSEC bogus due expiration

2023-11-03 Thread Viktor Dukhovni
On Fri, Nov 03, 2023 at 11:09:02AM +0100, Vladimír Čunát via dns-operations 
wrote:

> On 01/11/2023 17.18, Viktor Dukhovni wrote:
> > Should authoritative [nameservers] have knobs to perform internal checks on
> > the signed zones they serve and at least syslog loud warnings?
> 
> My understanding is that in this case the signer was producing loud syslog
> warnings immediately when the issue happened (i.e. long before validation
> could fail).

Sure, but the warnings were far from a clear indication that resigning
of the entire zone has stopped.  In any case, logging isn't exactly the
best interface for realtime monitoring.

I do think that exposing the next expiration time for monitoring and
likewise a list of zones where that time is too soon would be of value
to operators.  It doesn't obviate the need for active query probes,
those should still also happen, but I do think that operators would
benefit from such a (new) signal.

-- 
Viktor.

___
dns-operations mailing list
dns-operations@lists.dns-oarc.net
https://lists.dns-oarc.net/mailman/listinfo/dns-operations


Re: [dns-operations] anchors.atlas.ripe.net/ripe.net - DNSSEC bogus due expiration

2023-11-03 Thread Vladimír Čunát via dns-operations
--- Begin Message ---

On 01/11/2023 17.18, Viktor Dukhovni wrote:

Should authoritative resolvers have knobs to perform internal checks on
the signed zones they serve and at least syslog loud warnings?


My understanding is that in this case the signer was producing loud 
syslog warnings immediately when the issue happened (i.e. long before 
validation could fail).


--- End Message ---
___
dns-operations mailing list
dns-operations@lists.dns-oarc.net
https://lists.dns-oarc.net/mailman/listinfo/dns-operations


Re: [dns-operations] anchors.atlas.ripe.net/ripe.net - DNSSEC bogus due expiration

2023-11-03 Thread Steven Miller
I liked Viktor’s idea. It would be cool if time-to-re-sign and 
time-to-signature-expiration were available on the json/xml stats port. (Or are 
they and I missed it?  The last time I used the json/xml stuff, I wasn’t 
getting metrics for signed zones, just the usual counters and the 
time-to-expire for secondaries…)

-Steve

> On Nov 3, 2023, at 1:43 AM, Mark Andrews  wrote:
> 
> 
> 
>> On 3 Nov 2023, at 02:18, Viktor Dukhovni  wrote:
>> 
>> On Thu, Nov 02, 2023 at 09:34:17AM +0100, Stephane Bortzmeyer wrote:
>> 
 Specifically, in the case of signed zones, monitoring MUST also include
 regular checks of the remaining expiration time of at least the core
 zone apex records (DNSKEY, SOA and NS), and ideally the whole zone, both
 on the primary server and the secondaries.
>>> 
>>> Indeed. If you use Nagios or compatible (such as Icinga), I recommend
>>> this plugin for signatures monitoring:
>>> 
>>> http://dns.measurement-factory.com/tools/nagios-plugins/check_zone_rrsig_expiration.html
>> 
>> I wonder whether the widely authoritative resolvers could do more to
>> to help?
>> 
>> For example, BIND loads zone data into memory.  It should be able to
>> know the time of the soonest signature expiration for a zone, or at
>> least (if not loaing the whole zone into memory) the soonest expiration
>> time is of recently queried records.
> 
> When you let named perform the signing it does just that.  The RRSIGs are
> in a heap.  We look at the earliest expiration and figure out when it is
> due to be re-signed (could be in the past if the server was offline for a
> while).  We set a timer.  When that timer expires we re-sign that RRset plus
> several more along with an updated SOA record re-adding them to the heap.
> We set a timer for the next batch.  If the primary has been down too long
> and they have all expired the entire zone will be signed this way when the
> primary starts up.
> 
>> There could be a new "rdnc" protocol verb that asks the nameserver for a
>> list of all the zones where the soonest expiration time is below some
>> threshold, or askes about a particular zone.
>> 
>> Of course in that case the monitoring agent would be a in a "privileged"
>> position to query the nameserver's internal control plane, rather than
>> having to send queries through "the front door".
>> 
>> Both kinds of monitoring are likely important, but more visibility via
>> the control plane may be able to offer a precise/timely view.
>> 
>>   - Check each nameserver's control plane.
>>   - Check as much of the zone as possible.
>>   - Check each nameserver VIP over each supported protocol
>> (UDP, TCP, DoT, DoQ, ...)
>>   - From multiple vantage points if possible/applicable when
>> service is on anycast IPs.
>> 
>> Perhaps through OARC support development of monitoring plugins that
>> many operators can use?
>> 
>> If after all the past incidents minor and not so minor operators
>> still aren't doing adequate monitoring, perhaps we (the software
>> and standards) developers and haven't given them adequate tools?
>> 
>> --
>>   Viktor.
>> ___
>> dns-operations mailing list
>> dns-operations@lists.dns-oarc.net
>> https://lists.dns-oarc.net/mailman/listinfo/dns-operations
> 
> --
> Mark Andrews, ISC
> 1 Seymour St., Dundas Valley, NSW 2117, Australia
> PHONE: +61 2 9871 4742  INTERNET: ma...@isc.org
> 
> 
> ___
> dns-operations mailing list
> dns-operations@lists.dns-oarc.net
> https://lists.dns-oarc.net/mailman/listinfo/dns-operations

___
dns-operations mailing list
dns-operations@lists.dns-oarc.net
https://lists.dns-oarc.net/mailman/listinfo/dns-operations