Hi,
I'm using unbound on my openwrt router. Openwrt uses unbound-control to add the
dhcp leases (from odhcp) into the dns. This works fine, except that once in a
while, unbound-control seems to get stuck and never returns, and I end up with a
large number of unbound-control processes:
# ps | grep unbound-control
2335 root 5988 S /usr/sbin/unbound-control -c
/var/lib/unbound/unbound.conf local_datas_remove
2385 root 5988 S /usr/sbin/unbound-control -c
/var/lib/unbound/unbound.conf local_datas
2428 root 5988 S /usr/sbin/unbound-control -c
/var/lib/unbound/unbound.conf local_datas
2995 root 5988 S /usr/sbin/unbound-control -c
/var/lib/unbound/unbound.conf local_datas
3839 root 5988 S unbound-control -c /var/lib/unbound/unbound.conf
stats_noreset
3970 root 5988 S unbound-control -c /var/lib/unbound/unbound.conf
stats_noreset
4090 root 5988 S unbound-control -c /var/lib/unbound/unbound.conf
stats_noreset
25060 root 5964 S /usr/sbin/unbound-control -c
/var/lib/unbound/unbound.conf local_datas_remove
25064 root 5964 S /usr/sbin/unbound-control -c
/var/lib/unbound/unbound.conf local_datas_remove
28771 root 5984 S /usr/sbin/unbound-control -c
/var/lib/unbound/unbound.conf local_datas_remove
29845 root 5984 S /usr/sbin/unbound-control -c
/var/lib/unbound/unbound.conf local_datas_remove
30351 root 5984 S /usr/sbin/unbound-control -c
/var/lib/unbound/unbound.conf local_datas
30681 root 5968 S /usr/sbin/unbound-control -c
/var/lib/unbound/unbound.conf local_datas_remove
30721 root 5968 S /usr/sbin/unbound-control -c
/var/lib/unbound/unbound.conf local_datas
At that point, dns resolving becomes also problematic:
$ dig aaaa google.es @192.168.1.1
; <<>> DiG 9.16.1-Ubuntu <<>> aaaa google.es @192.168.1.1
;; global options: +cmd
;; connection timed out; no servers could be reached
$ dig aaaa google.es @fd81:631b:716f:10::1
; <<>> DiG 9.16.1-Ubuntu <<>> aaaa google.es @fd81:631b:716f:10::1
;; global options: +cmd
;; connection timed out; no servers could be reached
Once I manually kill all those unbound-control processes, everything starts
working again.
I have reported the problem on the openwrt forum:
https://forum.openwrt.org/t/issues-with-unbound-and-odhcp-setup/66354/26
The problem seems to be triggered by executing multiple unbound-control
instances in parallel. Openwrt now contains a workaround to avoid doing this
with a lockfile, but I suspect there is still some kind of bug present in
unbound itself. Maybe some kind of deadlock condition?
Any ideas what could be the problem here?
Jef