On 7/11/20 11:49 AM, Andrew Forgue via Unbound-users wrote:
I have an unbound server that acts as a recursive resolver for clients and also acts as a
target for fully delegated DNS (i.e. unbound is the NS record). For the fully-delegated
domain it is a simple stub zone with an upstream of localhost on a different port. Let's
call it "blah.example.com".
Occasionally, unbound (has happened on versions 1.10.1 and 1.7.3) will start
responding to non-recursive queries with the list of root zones instead of a
response from the stub-zone. It seems that clients that use the `rd` flag are
fine and continue to be able to resolve records in the stub-zone. Only
recursive desired clients will receive correct records from unbound (using the
stub server). All records in seemingly all stub zones have this behavior
simultaneously.
I don't know what triggers it, but a full restart of unbound is the only thing
that fixes it. I've tried flushing cache, flushing infra, and everything,
nothing seems to matter. I've seen only 2 things that may point to the issue.
- With verbosity turned up to 10, there's an entry produced in strace (but not in the
actual log - maybe a misconfig): "unbound[2213085:5] debug: answer from the cache
failed"
- stracing the "broken" unbound process is a very tight recvmsg() (of the
request) and sendmsg() (with the root servers) with no syscalls in between.
Again, Using dig with +recurse works all the time, even when unbound gets in
this state. So seems like an unbound bug / cache corruption or something?
If it is a bug, you may want to try a work around while waiting for a
fix. You could try "auth-zone:" instead of "stub-zone:" or as a
companion to "stub-zone:" You may need to give the authoritative server
permission for a wholesale zone transfer to the Unbound instance. This
may help avoid some undiscovered bug in piecemeal zone recursion.
- Eric