Gratuitous AXFRs of RPZ after 9.18.11

2023-01-26 Thread John Thurston
I have a primary server and a couple of secondaries. After making 
adjustments to my RPZ yesterday (which almost never change), I noticed 
an oddity. One of my secondaries is performing gratuitous AXFRs of the 
RPZ. This isn't a huge performance issue, as my RPZ is only 7.3KB. I 
want to understand why it is doing this, when other secondaries are not 
and when this secondary is _not_ also performing such gratuitous AXFRs 
of its other zones. Of note, the secondary in question has a "twin", for 
which the output of named-checkconf -px is identical (excepting the 
host-specific keys used for rndc access). That "twin" is behaving as 
expected.


To recap, the troublesome server has several secondary zones defined. 
All but the RPZ is transferring as expected. The troublesome server has 
a "twin", which is behaving correctly for all of the secondary zones.


The SOA-record for my RPZ looks like so:

;; ANSWER SECTION:
rpz.local.  300  IN   SOA  rpz.local. hostmaster.state.ak.us. 22 3600 
1800 432000 60


And I can see my several secondaries querying the primary for the 
SOA-record on a regular basis. With a 'refresh' value in the SOA of only 
3600, this is what I expect to see. What I don't expect to see, is the 
troublesome secondary then follows each of those queries for the SOA 
with an AXFR request, like so:


26-Jan-2023 15:25:40.175 client @0x7f19691c1280 
10.213.96.197#37631/key from-azw (rpz.local): view azw: query: 
rpz.local IN SOA -SE(0) (10.203.163.72)
26-Jan-2023 15:25:40.274 client @0x7f1968118970 
10.213.96.197#44769/key from-azw (rpz.local): view azw: query: 
rpz.local IN AXFR -ST (10.203.163.72)
26-Jan-2023 15:27:10.665 client @0x7f196925d6f0 
10.213.96.197#60123/key from-azw (rpz.local): view azw: query: 
rpz.local IN SOA -SE(0) (10.203.163.72)
26-Jan-2023 15:27:10.763 client @0x7f1968118970 
10.213.96.197#46011/key from-azw (rpz.local): view azw: query: 
rpz.local IN AXFR -ST (10.203.163.72)


When I dump the zone database from the secondary (rndc dumpdb -zone 
rpz.local), I can see the RPZ in it with the expected serial number and 
all of the expected records.


And after typing all of the above, I did an rndc status to get the 
versions of each, and discovered that the "twins" are not actually twins!


The troublesome host is:    9.18.11-1+ubuntu18.04.1+isc+2-Ubuntu

Its "twin" is:    9.18.10-1+ubuntu18.04.1+isc+1-Ubuntu

And now when I study my xfer.log more closely, the behavior changed this 
morning when I  completed the update from 9.18.10 -> 9.18.11


I'm not yet ready to revert, because this isn't affecting my business 
(this is a really small zone). Is anyone else seeing similar behavior?


--
--
Do things because you should, not just because you can.

John Thurston907-465-8591
john.thurs...@alaska.gov
Department of Administration
State of Alaska
-- 
Visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from 
this list

ISC funds the development of this software with paid support subscriptions. 
Contact us at https://www.isc.org/contact/ for more information.


bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users


Re: rpz testing -> shut down hung fetch while resolving

2023-01-26 Thread Evan Hunt
On Thu, Jan 26, 2023 at 07:03:37PM +0100, Havard Eidnes via bind-users wrote:
> Hi,
> 
> I recently made an upgrade of BIND to version 9.18.11 on our
> resolver cluster, following the recent announcement.  Shortly
> thereafter I received reports that the validation that lookups of
> "known entries" in our quite small RPZ feed (it's around 1MB
> on-disk) no longer succeeds as expected, but instead take a long
> time, finally gives SRVFAIL to the client, and associated with
> this we get this log message:
> 
> Jan 26 18:41:27 xxx-res named[6179]: shut down hung fetch while resolving 
> 'known-rpz-entry.no/A'

This usually means there's a circular dependency somewhere in the
resolution or validation process. For example, we can't resolve a name
without looking up the address of a name server, but that lookup can't
succeed until the original name is resolved. The two lookups will wait on
each other for ten seconds, and then the whole query times out and issues
that log message.

The log message is new in 9.18, but the 10-second delay and SERVFAIL
response would probably have happened in earlier releases as well.

-- 
Evan Hunt -- e...@isc.org
Internet Systems Consortium, Inc.
-- 
Visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from 
this list

ISC funds the development of this software with paid support subscriptions. 
Contact us at https://www.isc.org/contact/ for more information.


bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users


rpz testing -> shut down hung fetch while resolving

2023-01-26 Thread Havard Eidnes via bind-users
Hi,

I recently made an upgrade of BIND to version 9.18.11 on our
resolver cluster, following the recent announcement.  Shortly
thereafter I received reports that the validation that lookups of
"known entries" in our quite small RPZ feed (it's around 1MB
on-disk) no longer succeeds as expected, but instead take a long
time, finally gives SRVFAIL to the client, and associated with
this we get this log message:

Jan 26 18:41:27 xxx-res named[6179]: shut down hung fetch while resolving 
'known-rpz-entry.no/A'

Initially I thought that this was new behaviour between BIND
9.18.10 and 9.18.11, but after downgrading to 9.18.10 on one of
the affected nodes, this problem is still observable there.
Also, only a subset of our 4 nodes exhibit this behaviour,
despite the unaffected ones running 9.18.11, which is quite
strange.  None of the name servers are under severe strain by any
measure -- one affected sees around 200qps, another around 50qps
at the time of writing.

I want to ask if this sort of issue is already known (I briefly
searched the issues on ISC's gitlab and came up empty), and also
to ask if there is any particular sort of information I should
collect to narrow this down if it is a new issue.

Regards,

- Håvard
-- 
Visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from 
this list

ISC funds the development of this software with paid support subscriptions. 
Contact us at https://www.isc.org/contact/ for more information.


bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users


lame-servers: info: no valid RRSIG resolving

2023-01-26 Thread duluxoz

Hi All,

Sorry for asking what is almost certainly a "noob" question, but I'm 
seeing a lot of "lame-servers: info: no valid RRSIG resolving 
'./NS/IN':" messages in our auth_servers.log for the DNS Root Servers' 
IPv4 addresses. Is this normal, or do we have an issue that we need to 
resolve.


Thanks for the feedback

Cheers

Dulux-Oz
--
Visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from 
this list

ISC funds the development of this software with paid support subscriptions. 
Contact us at https://www.isc.org/contact/ for more information.


bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users