Hi Wolfgang, I observe the same increased SERVFAILs ("misc failure") after updating to Unbound 1.22.0. Also on a low-volume recursor.
I have not had the opportunity to take a closer look, but wanted to provide anecdotal evidence that you are not alone! Cheers, Otto Wolfgang Breyha via Unbound-users wrote:
Hi! I'm operating a small private (low volume) recurser for my own purpose for years using unbound since about 1.6.x. Without (recognized) issues so far. But with 1.22+ I noticed some oddities with unexpected SERVFAILs. Incoming requests are made with DoT on port 853 and locally (classic port 53). My config mostly uses defaults except [0]. I first recognized it with failed mail reception from GMX, because unbound occasionally was not able to resolve the PTR RRs of their outgoing mail relay. The "verb 1; log-servfail: yes" log showed only error: SERVFAIL <18.15.227.212.in-addr.arpa. PTR IN>: misc failure A closer look to the logs showed a lot of rather odd "misc failure"s. eg.: error: SERVFAIL <ctldl.windowsupdate.com. AAAA IN>: misc failure error: SERVFAIL <alexa.amazon.de. A IN>: misc failure error: SERVFAIL <www.paypal.com. A IN>: misc failure All of them worked at a later retry as expected. I searched the source for the "misc failure" message and found the new (at least to me) option "max-global-quota" as one reason. Afterwards I raised the verbosity to 3 to see more details. At the same time I added msg-cache-size: 4m num-queries-per-thread: 4096 rrset-cache-size: 8m cache-min-ttl: 10 cache-max-negative-ttl: 3600 infra-cache-min-rtt: 100 to [0]. But I still didn't change the "max-global-quota" default. To my surprise this also influenced the "misc failure" rate positively and only some "in-addr.arpa" SERVFAILed with it. They all triggered the "request xxxx has exceeded the maximum global quota on number of upstream queries yyy" message in the debug log. I then removed the modifications from the config again and returned to plain [0] and the raised rate of "misc failures" including quite prominent zones returned as well. eg.: debug: request 3.pool.ntp.org. has exceeded the maximum global quota on number of upstream queries 155 debug: return error response SERVFAIL Searching for the highest "number of upstream queries" gave 180 for error: SERVFAIL <at.mirrors.cicku.me. AAAA IN>: misc failure This one failed again when I retried while writing this mail with "139". The second try gave the correct answer. Obviously the cache size and primarily the contents influences the needed maximum number of requests. I'm wondering if I'm the only one seeing this? IMO either the default of 128 is simply to low for low volume recursers or there is some other oddity with this option. Greetings, Wolfgang Breyha [0] config (stripped access, tls keys, common stuff) outgoing-port-permit: 32768-60999 outgoing-port-avoid: 0-32767 so-rcvbuf: 4m so-sndbuf: 4m so-reuseport: yes ip-transparent: yes max-udp-size: 4096 log-servfail: yes harden-glue: yes harden-dnssec-stripped: yes harden-below-nxdomain: yes harden-referral-path: yes qname-minimisation: yes aggressive-nsec: yes use-caps-for-id: no unwanted-reply-threshold: 10000000 prefetch: yes prefetch-key: yes rrset-roundrobin: yes minimal-responses: no val-clean-additional: yes val-permissive-mode: no serve-expired: no val-log-level: 1