Re: [dns-operations] anchors.atlas.ripe.net/ripe.net - DNSSEC bogus due expiration

2023-11-03 Thread Steven Miller

It'd still be good to have that exposed as a metric, since:

 * that way you don't have to wait to make the mistake (or to find the
   logs from someone else's mistake) in order to wrap alerting around it
 * the metric's more or less the metric forever-ish, while it seems
   more likely that a well-intentioned phrasing change in one of the
   logs could screw up whatever pattern's being used to match it
 * I personally think that the metric is somehow more in my face than
   the logs (e.g., "oh look, I dumped the metrics with a curl/wget and
   that looks very much like a counter we need to wrap something
   around" )
 * for those living in the Prometheus/Grafana/Loki ecosystem, it may be
   a bit easier to just run a copy of the BIND exporter
   (https://github.com/prometheus-community/bind_exporter) than to make
   sure that all the logs are getting scraped appropriately and the
   path to get them into Loki works and keeps working all the time --
   it being easier to generate a no-data alert for a metric than it is
   to say "this log message we never get, we still haven't gotten it"

And yes, I recognize that "well, Steve, the code's right over here, go 
to it" is a valid argument.


    -Steve

On 11/3/2023 6:09 AM, Vladimír Čunát via dns-operations wrote:


My understanding is that in this case the signer was producing loud 
syslog warnings immediately when the issue happened (i.e. long before 
validation could fail).


___
dns-operations mailing list
dns-operations@lists.dns-oarc.net
https://lists.dns-oarc.net/mailman/listinfo/dns-operations


Re: [dns-operations] anchors.atlas.ripe.net/ripe.net - DNSSEC bogus due expiration

2023-11-03 Thread Steven Miller
I liked Viktor’s idea. It would be cool if time-to-re-sign and 
time-to-signature-expiration were available on the json/xml stats port. (Or are 
they and I missed it?  The last time I used the json/xml stuff, I wasn’t 
getting metrics for signed zones, just the usual counters and the 
time-to-expire for secondaries…)

-Steve

> On Nov 3, 2023, at 1:43 AM, Mark Andrews  wrote:
> 
> 
> 
>> On 3 Nov 2023, at 02:18, Viktor Dukhovni  wrote:
>> 
>> On Thu, Nov 02, 2023 at 09:34:17AM +0100, Stephane Bortzmeyer wrote:
>> 
 Specifically, in the case of signed zones, monitoring MUST also include
 regular checks of the remaining expiration time of at least the core
 zone apex records (DNSKEY, SOA and NS), and ideally the whole zone, both
 on the primary server and the secondaries.
>>> 
>>> Indeed. If you use Nagios or compatible (such as Icinga), I recommend
>>> this plugin for signatures monitoring:
>>> 
>>> http://dns.measurement-factory.com/tools/nagios-plugins/check_zone_rrsig_expiration.html
>> 
>> I wonder whether the widely authoritative resolvers could do more to
>> to help?
>> 
>> For example, BIND loads zone data into memory.  It should be able to
>> know the time of the soonest signature expiration for a zone, or at
>> least (if not loaing the whole zone into memory) the soonest expiration
>> time is of recently queried records.
> 
> When you let named perform the signing it does just that.  The RRSIGs are
> in a heap.  We look at the earliest expiration and figure out when it is
> due to be re-signed (could be in the past if the server was offline for a
> while).  We set a timer.  When that timer expires we re-sign that RRset plus
> several more along with an updated SOA record re-adding them to the heap.
> We set a timer for the next batch.  If the primary has been down too long
> and they have all expired the entire zone will be signed this way when the
> primary starts up.
> 
>> There could be a new "rdnc" protocol verb that asks the nameserver for a
>> list of all the zones where the soonest expiration time is below some
>> threshold, or askes about a particular zone.
>> 
>> Of course in that case the monitoring agent would be a in a "privileged"
>> position to query the nameserver's internal control plane, rather than
>> having to send queries through "the front door".
>> 
>> Both kinds of monitoring are likely important, but more visibility via
>> the control plane may be able to offer a precise/timely view.
>> 
>>   - Check each nameserver's control plane.
>>   - Check as much of the zone as possible.
>>   - Check each nameserver VIP over each supported protocol
>> (UDP, TCP, DoT, DoQ, ...)
>>   - From multiple vantage points if possible/applicable when
>> service is on anycast IPs.
>> 
>> Perhaps through OARC support development of monitoring plugins that
>> many operators can use?
>> 
>> If after all the past incidents minor and not so minor operators
>> still aren't doing adequate monitoring, perhaps we (the software
>> and standards) developers and haven't given them adequate tools?
>> 
>> --
>>   Viktor.
>> ___
>> dns-operations mailing list
>> dns-operations@lists.dns-oarc.net
>> https://lists.dns-oarc.net/mailman/listinfo/dns-operations
> 
> --
> Mark Andrews, ISC
> 1 Seymour St., Dundas Valley, NSW 2117, Australia
> PHONE: +61 2 9871 4742  INTERNET: ma...@isc.org
> 
> 
> ___
> dns-operations mailing list
> dns-operations@lists.dns-oarc.net
> https://lists.dns-oarc.net/mailman/listinfo/dns-operations

___
dns-operations mailing list
dns-operations@lists.dns-oarc.net
https://lists.dns-oarc.net/mailman/listinfo/dns-operations


Re: [dns-operations] OpenDNS, Google, Nominet - New delegation update failure mode

2020-04-04 Thread Steven Miller
Also, the whole point of the DNS was to eliminate a flat namespace (the old 
hosts.txt file, for the three people here old enough to remember that!), so if 
the barriers to entry for new TLDs are low, everyone gets one, and now we have 
a large, flat namespace again. There are some operations costs to the root 
operators (not huge, but there) as well. 

Those are all mild oversimplifications, but the idea is basically right. 

-Steve

> On Apr 4, 2020, at 11:01 AM, John Levine  wrote:
> 
> In article <85882353-8f7c-365b-43e7-6092ad82c...@plum.ovh> you write:
>> I have a question, why does domain name have to be assigned by ICANN?
>> I expect everyone could have his/her own domain name, naming is freedom.
> 
> That's not how the Internet works.  There's only one set of root
> servers and for historical and practical reasons they take ICANN's
> advice about what goes into the root zone.
> 
> People have been disagreeing about that for the past 25 years and
> there's vast amounts of material about it on the net.  But this has
> gotten way outside anything related to DNS operations.
> 
> R's,
> John
> 
> PS: I have a friend who also wants .plum.  Who decides who gets it?
> ___
> dns-operations mailing list
> dns-operations@lists.dns-oarc.net
> https://lists.dns-oarc.net/mailman/listinfo/dns-operations

___
dns-operations mailing list
dns-operations@lists.dns-oarc.net
https://lists.dns-oarc.net/mailman/listinfo/dns-operations


Re: [dns-operations] solutions for DDoS mitigation of DNS

2020-04-03 Thread Steven Miller
Adding more servers and going to 10G NICs seems relatively inexpensive 
and that should be helpful for "casual" attacks where you're being used 
as a reflector.  In those attacks, no one's out to attack you: they just 
want you to attack someone else, and don't mind eating all your 
bandwidth/CPU/whatever in order to do that.


Adding more bandwidth without enabling RRL or putting some sort of 
filtering in place will make your facilities more attractive to 
attackers, though.  I'd expect that attackers are passing around lists 
of particularly good sites for reflector attacks, and the fewer controls 
you have, and the more bandwidth you have, the more attractive you are 
for use in an attack -- and therefore the more likely you are to have 
your resources saturated.


I think RRL should be safe to run all the time.  You wouldn't want to 
scramble to enable it during an attack.


I don't know if there are commercial devices I would trust to be helpful 
in these situations, but when I was doing DNS DDoS work, nothing 
commercial was going to scale enough, so I didn't consider them much. :-)


The hard thing about these attacks is that there's always some time when 
local resources aren't enough: when you upgrade to 50Gbit/sec of 
capacity and the next attack is 60Gbit/sec of traffic.  I'd expect some 
correlation between "really high bandwidth attacks" and "attacks that 
are meant to hurt you instead of just use you as a reflector" but that 
correlation won't be perfect.  It's unfortunate that in the DNS attack 
world, for a lot of attacks, all you can do is have massively more 
capacity than you need on a daily basis.


The advantage to moving DNS into a cloud provider is that they have the 
resources to massively, crazily overprovision, to the point where it 
would be hard even for a nation-state to mount a big enough attack to 
take them down.  I'm most familiar with Cloudflare (I have never worked 
there, for the record) but certainly there are other companies worth 
looking at.  However, if you still have your nameservers in the public 
set of NS records for your domains, you'll still be vulnerable.  Some of 
these providers can probably load your zones using you as a shadow 
master: they just do a zone transfer from your DNS infrastructure, then 
serve all the queries from their own systems.


That's my perspective.  Hopefully it's not too out of date.

    -Steve

On 4/3/2020 4:18 AM, Tessa Plum wrote:

Hi Steve

I am so appreciate to get your kind private message, though I would 
like to reply my content to the list.


We are running authoritative name servers only, zone data are for the 
university only.


When the attack happened, the bandwidth watched in our gateway was 
about 20Gbps. That made name servers totally no response. Each name 
server has only 1Gbps interface to internet, so it dies.


We were considering the actions:
1. increase bandwidth to both inbound gateway and vlan for nameservers.
2. upgrade the network interface of nameserver to 10Gbps.
3. run multiple servers as cluster.
4. try to get a commercial device to analyst and stop such kind of 
attack.

5. enable RRL when attack happens.
6. I will try to suggest administrator to run secondary nameservers on 
professional hosting, such as cloudflare, Akamai, AWS route 53 etc.

  (also easyDNS, DNSimple, DNSMadeEasy, NS1 can be considered?)

How do you think of them?

Thank you.

regards
Tessa



___
dns-operations mailing list
dns-operations@lists.dns-oarc.net
https://lists.dns-oarc.net/mailman/listinfo/dns-operations


Re: [dns-operations] solutions for DDoS mitigation of DNS

2020-04-03 Thread Steven Miller
Essentially, yes.  Some increase in capacity on your side plus RRL will 
certainly keep you safer, but it's no guarantee.


Though to be clear, every few years, someone's going to hit a public DNS 
provider with enough load to cause a problem.  IMHO that'll happen less 
on average, and the mitigation time will be lower on average, and the 
pain level for you will be lower on average (no scrambling for 
resources, the ability to say "yeah, a big chunk of the Internet's down, 
I'll let you know when it's over" :-)) than it will happen if you run 
your own infrastructure.


It's a really unfortunate state of affairs.

    -Steve

On 4/3/2020 5:03 AM, Tessa Plum wrote:
So no way to stop reflector attack unless migrating servers to 
professional IDC?


Thanks.

Steven Miller wrote:
Adding more servers and going to 10G NICs seems relatively 
inexpensive and that should be helpful for "casual" attacks where 
you're being used as a reflector.  In those attacks, no one's out to 
attack you: they just want you to attack someone else, and don't mind 
eating all your bandwidth/CPU/whatever in order to do that.


Adding more bandwidth without enabling RRL or putting some sort of 
filtering in place will make your facilities more attractive to 
attackers, though.  I'd expect that attackers are passing around 
lists of particularly good sites for reflector attacks, and the fewer 
controls you have, and the more bandwidth you have, the more 
attractive you are for use in an attack -- and therefore the more 
likely you are to have your resources saturated.


I think RRL should be safe to run all the time.  You wouldn't want to 
scramble to enable it during an attack.


I don't know if there are commercial devices I would trust to be 
helpful in these situations, but when I was doing DNS DDoS work, 
nothing commercial was going to scale enough, so I didn't consider 
them much. :-)


The hard thing about these attacks is that there's always some time 
when local resources aren't enough: when you upgrade to 50Gbit/sec of 
capacity and the next attack is 60Gbit/sec of traffic.  I'd expect 
some correlation between "really high bandwidth attacks" and "attacks 
that are meant to hurt you instead of just use you as a reflector" 
but that correlation won't be perfect.  It's unfortunate that in the 
DNS attack world, for a lot of attacks, all you can do is have 
massively more capacity than you need on a daily basis.


The advantage to moving DNS into a cloud provider is that they have 
the resources to massively, crazily overprovision, to the point where 
it would be hard even for a nation-state to mount a big enough attack 
to take them down.  I'm most familiar with Cloudflare (I have never 
worked there, for the record) but certainly there are other companies 
worth looking at.  However, if you still have your nameservers in the 
public set of NS records for your domains, you'll still be 
vulnerable.  Some of these providers can probably load your zones 
using you as a shadow master: they just do a zone transfer from your 
DNS infrastructure, then serve all the queries from their own systems.


That's my perspective.  Hopefully it's not too out of date.

 -Steve

On 4/3/2020 4:18 AM, Tessa Plum wrote:

Hi Steve

I am so appreciate to get your kind private message, though I would 
like to reply my content to the list.


We are running authoritative name servers only, zone data are for 
the university only.


When the attack happened, the bandwidth watched in our gateway was 
about 20Gbps. That made name servers totally no response. Each name 
server has only 1Gbps interface to internet, so it dies.


We were considering the actions:
1. increase bandwidth to both inbound gateway and vlan for nameservers.
2. upgrade the network interface of nameserver to 10Gbps.
3. run multiple servers as cluster.
4. try to get a commercial device to analyst and stop such kind of 
attack.

5. enable RRL when attack happens.
6. I will try to suggest administrator to run secondary nameservers 
on professional hosting, such as cloudflare, Akamai, AWS route 53 etc.

  (also easyDNS, DNSimple, DNSMadeEasy, NS1 can be considered?)

How do you think of them?

Thank you.

regards
Tessa





___
dns-operations mailing list
dns-operations@lists.dns-oarc.net
https://lists.dns-oarc.net/mailman/listinfo/dns-operations