Hi Eduard, Hard to say why this happens periodically to you. Do you see an increase in the incoming queries when this happens? Maybe running out of some buffer space? Or do you by any chance periodically perform an expensive operation on unbound, like doing a dump_cache from cron? Are there any errors written to the log?
-- Ralph On 11-07-19 10:34, Eduard Ahmatgareev via Unbound-users wrote: > Hi everyone, > > I faced with intersting issue with unbound server and couldn't figure > out without your help > We used unbound as primary dns resolver in our aws infrastructure, but > from time to time unbound server is not responding to queries from our > clients > Also I found by tcpdump and wireshark a lot of retransmission DNS > requests from clients in the subnets. > But this issue present periodically, our clients get timeout issue > during the day. > from 100 queries, timeout can be get for 3-8 queries. > > For debug I used command: > perf trace -p $(pidof unbound) --duration=10 > and got next: > 13.285 (599.741 ms): unbound/15943 epoll_pwait(epfd: > 54<anon_inode:[eventpoll]>, events: 0x564955c6ae10, maxevents: 128, > timeout: -1, sigsetsize: 8) = -1 EINTR Interrupted system call > 616.016 (94.403 ms): unbound/15943 epoll_pwait(epfd: > 54<anon_inode:[eventpoll]>, events: 0x564955c6ae10, maxevents: 128, > timeout: -1, sigsetsize: 8) = 1 > 710.662 (130.206 ms): unbound/15943 epoll_pwait(epfd: > 54<anon_inode:[eventpoll]>, events: 0x564955c6ae10, maxevents: 128, > timeout: -1, sigsetsize: 8) = 1 > 616.649 (224.502 ms): unbound/15952 epoll_pwait(epfd: > 42<anon_inode:[eventpoll]>, events: 0x7faea89ea7f0, maxevents: 128, > timeout: -1, sigsetsize: 8) = 1 > 850.606 (112.947 ms): unbound/15952 epoll_pwait(epfd: > 42<anon_inode:[eventpoll]>, events: 0x7faea89ea7f0, maxevents: 128, > timeout: -1, sigsetsize: 8) = 1 > 13.453 (1160.129 ms): unbound/15951 epoll_pwait(epfd: > 37<anon_inode:[eventpoll]>, events: 0x7faea47ca3e0, maxevents: 64, > timeout: -1, sigsetsize: 8) = 1 > 840.904 (335.113 ms): unbound/15943 epoll_pwait(epfd: > 54<anon_inode:[eventpoll]>, events: 0x564955c6ae10, maxevents: 128, > timeout: -1, sigsetsize: 8) = 1 > 710.891 (465.469 ms): unbound/15950 epoll_pwait(epfd: > 36<anon_inode:[eventpoll]>, events: 0x7faeac8b2680, maxevents: 128, > timeout: -1, sigsetsize: 8) = 1 > 13.769 (1174.857 ms): unbound/15954 epoll_pwait(epfd: > 48<anon_inode:[eventpoll]>, events: 0x7fae98747c20, maxevents: 128, > timeout: -1, sigsetsize: 8) = 1 > 1176.048 (17.121 ms): unbound/15943 epoll_pwait(epfd: > 54<anon_inode:[eventpoll]>, events: 0x564955c6ae10, maxevents: 128, > timeout: -1, sigsetsize: 8) = -1 EINTR Interrupted system call > 1175.740 (21.495 ms): unbound/15951 epoll_pwait(epfd: > 37<anon_inode:[eventpoll]>, events: 0x7faea47ca3e0, maxevents: 64, > timeout: -1, sigsetsize: 8) = 1 > 1177.587 (19.955 ms): unbound/15950 epoll_pwait(epfd: > 36<anon_inode:[eventpoll]>, events: 0x7faeac8b2680, maxevents: 128, > timeout: 264, sigsetsize: 8) = 1 > 1196.914 (11.097 ms): unbound/15954 epoll_pwait(epfd: > 48<anon_inode:[eventpoll]>, events: 0x7fae98747c20, maxevents: 128, > timeout: -1, sigsetsize: 8) = 1 > > > > our infra: > ec2: c5.2xlarge (16gb mem, 8cores, 60gb gp2) > dist: amazon linux 2 > > unbound-libs-1.6.6-1.amzn2.0.2.x86_64 > unbound-python-1.6.6-1.amzn2.0.2.x86_64 > unbound-1.6.6-1.amzn2.0.2.x86_64 > > conf: > server: > verbosity: 1 > num-threads: 8 > statistics-interval: 0 > extended-statistics: yes > statistics-cumulative: no > msg-cache-slabs: 4 > rrset-cache-slabs: 4 > infra-cache-slabs: 4 > key-cache-slabs: 4 > rrset-cache-size: 100m > msg-cache-size: 50m > so-rcvbuf: 4m > so-sndbuf: 4m > so-reuseport: yes > outgoing-range: 8192 > num-queries-per-thread: 4096 > do-daemonize: no > prefetch: yes > rrset-roundrobin: yes > logfile: "" > use-syslog: no > directory: "/etc/unbound" > chroot: "" > log-queries: no > access-control: 0.0.0.0/0 <http://0.0.0.0/0> allow > interface: 0.0.0.0 > interface-automatic: yes > port: 53 > do-ip4: yes > do-ip6: no > do-udp: yes > do-tcp: yes > username: "unbound" > pidfile: "/var/run/unbound/unbound.pid" > root-hints: /etc/unbound/root.hints > key-cache-size: 32m > local-zone: "10.in-addr.arpa." nodefault > > remote-control: > control-enable: yes > > any ideas? >
