Hi there,

I have an unbound server that acts as a recursive resolver for clients and also 
acts as a target for fully delegated DNS (i.e. unbound is the NS record).

For the fully-delegated domain it is a simple stub zone with an upstream of 
localhost on a different port.  Let's call it "blah.example.com".

Occasionally, unbound (has happened on versions 1.10.1 and 1.7.3) will start 
responding to non-recursive queries with the list of root zones instead of a 
response from the stub-zone.  It seems that clients that use the `rd` flag are 
fine and continue to be able to resolve records in the stub-zone.  Only 
recursive desired clients will receive correct records from unbound (using the 
stub server).  All records in seemingly all stub zones have this behavior 
simultaneously.

I don't know what triggers it, but a full restart of unbound is the only thing 
that fixes it.  I've tried flushing cache, flushing infra, and everything, 
nothing seems to matter.

I've seen only 2 things that may point to the issue.

- With verbosity turned up to 10, there's an entry produced in strace (but not 
in the actual log - maybe a misconfig):

        "unbound[2213085:5] debug: answer from the cache failed"

- stracing the "broken" unbound process is a very tight recvmsg() (of the 
request) and sendmsg() (with the root servers) with no syscalls in between.

Again, Using dig with +recurse works all the time, even when unbound gets in 
this state.  So seems like an unbound bug / cache corruption or something?

Any ideas?

Reply via email to