More investigation results:

I think the issue appears when unbound is trying to probe the master servers for the auth_zone section. The logs show unbound doing lookups on all the auth_zone domain names in my config, and I think unbound is answering them from its own
cache. After the lookups, I get the following in the logs:

May 18 14:59:11 homebrew unbound[30620]: [30620:0] info: mesh_run: end 13 recursion states (462 with reply, 13 detached), 13 waiting replies, 1444 recursion replies sent, 1 replies dropped, 0 states jostled out
[*snip* histogram]
May 18 14:59:11 homebrew unbound[30620]: [30620:0] info: 0RDd mod0 b.root-servers.net. A IN May 18 14:59:11 homebrew unbound[30620]: [30620:0] info: 1RDd mod0 c.root-servers.net. A IN May 18 14:59:11 homebrew unbound[30620]: [30620:0] info: 2RDd mod0 f.root-servers.net. A IN May 18 14:59:11 homebrew unbound[30620]: [30620:0] info: 3RDd mod0 g.root-servers.net. A IN May 18 14:59:11 homebrew unbound[30620]: [30620:0] info: 4RDd mod0 k.root-servers.net. A IN May 18 14:59:11 homebrew unbound[30620]: [30620:0] info: 5RDd mod0 xfr.cjr.dns.icann.org. A IN May 18 14:59:11 homebrew unbound[30620]: [30620:0] info: 6RDd mod0 xfr.lax.dns.icann.org. A IN May 18 14:59:11 homebrew unbound[30620]: [30620:0] info: 7RDd mod0 b.root-servers.net. AAAA IN May 18 14:59:11 homebrew unbound[30620]: [30620:0] info: 8RDd mod0 c.root-servers.net. AAAA IN May 18 14:59:11 homebrew unbound[30620]: [30620:0] info: 9RDd mod0 g.root-servers.net. AAAA IN May 18 14:59:11 homebrew unbound[30620]: [30620:0] info: 10RDd mod0 k.root-servers.net. AAAA IN May 18 14:59:11 homebrew unbound[30620]: [30620:0] info: 11RDd mod0 xfr.cjr.dns.icann.org. AAAA IN May 18 14:59:11 homebrew unbound[30620]: [30620:0] info: 12RDd mod0 xfr.lax.dns.icann.org. AAAA IN May 18 14:59:11 homebrew unbound[30620]: [30620:0] info: mesh_run: end 12 recursion states (462 with reply, 12 detached), 12 waiting replies, 1444 recursion replies sent, 1 replies dropped, 0 states jostled out
The above pattern repeats, with the last entry in the RDd list dropping off, and the `detached` and `waiting reply` counters going down by one each time. Once
the list is emptied I get:

⋮
May 18 14:59:11 homebrew unbound[30620]: [30620:0] info: mesh_run: end 1 recursion states (462 with reply, 1 detached), 1 waiting replies, 1444 recursion replies sent, 1 replies dropped, 0 states jostled out
[*snip* histogram]
0RDd mod0  b.root-servers.net. A IN
May 18 14:59:11 homebrew unbound[30620]: [30620:0] info: mesh_run: end 1 recursion states (462 with reply, 0 detached), 0 waiting replies, 1444 recursion replies sent, 1 replies dropped, 0 states jostled out
[*snip* histogram]
May 18 14:59:11 homebrew unbound[30620]: [30620:0] debug: auth zone .: soa probe serial is 2018051800 May 18 14:59:11 homebrew unbound[30620]: [30620:0] debug: auth_zone unchanged, new lease, wait May 18 14:59:11 homebrew unbound[30620]: [30620:0] debug: auth zone . timeout in 1800 seconds
May 18 14:59:11 homebrew unbound[30620]: [30620:0] debug: close fd 15

From this point onward, queries are rejected. Half an hour later, the auth-zone lookup attempt repeats, and it all looks like above. 4 hours later, I get a
bunch of lines like this:

May 18 16:59:11 homebrew unbound[30620]: [30620:0] debug: comm point start listening 15 May 18 16:59:11 homebrew unbound[30620]: [30620:0] debug: Reading tcp query of length 17540 May 18 16:59:11 homebrew unbound[30620]: [30620:0] debug: comm point stop listening 15 May 18 16:59:11 homebrew unbound[30620]: [30620:0] debug: comm point start listening 15
May 18 16:59:12 homebrew unbound[30620]: [30620:0] debug: Reading tcp query of length 6523 May 18 16:59:12 homebrew unbound[30620]: [30620:0] debug: comm point stop listening 15 May 18 16:59:12 homebrew unbound[30620]: [30620:0] debug: xfr b.root-servers.net: last AXFR packet

Followed by a big list:

May 18 16:59:12 homebrew unbound[30620]: [30620:0] debug: apply_axfr at[0] . SOA May 18 16:59:12 homebrew unbound[30620]: [30620:0] debug: apply_axfr at[1] . RRSIG
May 18 16:59:12 homebrew unbound[30620]: [30620:0] debug: apply_axfr at[3028] camera. DS May 18 16:59:57 homebrew systemd-journald[199]: Suppressed 19535 messages from unbound.service

Which looks like it got new records for the auth_zone? I'm not sure. It keeps
looping the auth_zone lookups every half hour still.

I added some logging to mesh_make_new_space(), which showed that no jostle is attempted because mesh->jostle_first is a nullptr. I don't know if that's
normal.

Since it seems to coincide with the auth-zone lookups, I will try disabling it
to see if the issue resolves itself.

Reply via email to