Hi Ralph, Thank you for response, do dump_cache with cron it's good idea, also probably I can merge it with command to get request_list and join all of it with my cron to collect tcpdump traffic But now, I know next: amazon doesn't like NXDOMAIN records, if query arrived to nonexistent domain, unbound forwards this query to aws vpc dns server and aws spend a lot of time to return answer. Probably it can be our issue, but I am not 100% sure
чт, 11 июл. 2019 г. в 12:36, Ralph Dolmans via Unbound-users < [email protected]>: > Hi Eduard, > > Hard to say why this happens periodically to you. Do you see an increase > in the incoming queries when this happens? Maybe running out of some > buffer space? Or do you by any chance periodically perform an expensive > operation on unbound, like doing a dump_cache from cron? Are there any > errors written to the log? > > -- Ralph > > On 11-07-19 10:34, Eduard Ahmatgareev via Unbound-users wrote: > > Hi everyone, > > > > I faced with intersting issue with unbound server and couldn't figure > > out without your help > > We used unbound as primary dns resolver in our aws infrastructure, but > > from time to time unbound server is not responding to queries from our > > clients > > Also I found by tcpdump and wireshark a lot of retransmission DNS > > requests from clients in the subnets. > > But this issue present periodically, our clients get timeout issue > > during the day. > > from 100 queries, timeout can be get for 3-8 queries. > > > > For debug I used command: > > perf trace -p $(pidof unbound) --duration=10 > > and got next: > > 13.285 (599.741 ms): unbound/15943 epoll_pwait(epfd: > > 54<anon_inode:[eventpoll]>, events: 0x564955c6ae10, maxevents: 128, > > timeout: -1, sigsetsize: 8) = -1 EINTR Interrupted system call > > 616.016 (94.403 ms): unbound/15943 epoll_pwait(epfd: > > 54<anon_inode:[eventpoll]>, events: 0x564955c6ae10, maxevents: 128, > > timeout: -1, sigsetsize: 8) = 1 > > 710.662 (130.206 ms): unbound/15943 epoll_pwait(epfd: > > 54<anon_inode:[eventpoll]>, events: 0x564955c6ae10, maxevents: 128, > > timeout: -1, sigsetsize: 8) = 1 > > 616.649 (224.502 ms): unbound/15952 epoll_pwait(epfd: > > 42<anon_inode:[eventpoll]>, events: 0x7faea89ea7f0, maxevents: 128, > > timeout: -1, sigsetsize: 8) = 1 > > 850.606 (112.947 ms): unbound/15952 epoll_pwait(epfd: > > 42<anon_inode:[eventpoll]>, events: 0x7faea89ea7f0, maxevents: 128, > > timeout: -1, sigsetsize: 8) = 1 > > 13.453 (1160.129 ms): unbound/15951 epoll_pwait(epfd: > > 37<anon_inode:[eventpoll]>, events: 0x7faea47ca3e0, maxevents: 64, > > timeout: -1, sigsetsize: 8) = 1 > > 840.904 (335.113 ms): unbound/15943 epoll_pwait(epfd: > > 54<anon_inode:[eventpoll]>, events: 0x564955c6ae10, maxevents: 128, > > timeout: -1, sigsetsize: 8) = 1 > > 710.891 (465.469 ms): unbound/15950 epoll_pwait(epfd: > > 36<anon_inode:[eventpoll]>, events: 0x7faeac8b2680, maxevents: 128, > > timeout: -1, sigsetsize: 8) = 1 > > 13.769 (1174.857 ms): unbound/15954 epoll_pwait(epfd: > > 48<anon_inode:[eventpoll]>, events: 0x7fae98747c20, maxevents: 128, > > timeout: -1, sigsetsize: 8) = 1 > > 1176.048 (17.121 ms): unbound/15943 epoll_pwait(epfd: > > 54<anon_inode:[eventpoll]>, events: 0x564955c6ae10, maxevents: 128, > > timeout: -1, sigsetsize: 8) = -1 EINTR Interrupted system call > > 1175.740 (21.495 ms): unbound/15951 epoll_pwait(epfd: > > 37<anon_inode:[eventpoll]>, events: 0x7faea47ca3e0, maxevents: 64, > > timeout: -1, sigsetsize: 8) = 1 > > 1177.587 (19.955 ms): unbound/15950 epoll_pwait(epfd: > > 36<anon_inode:[eventpoll]>, events: 0x7faeac8b2680, maxevents: 128, > > timeout: 264, sigsetsize: 8) = 1 > > 1196.914 (11.097 ms): unbound/15954 epoll_pwait(epfd: > > 48<anon_inode:[eventpoll]>, events: 0x7fae98747c20, maxevents: 128, > > timeout: -1, sigsetsize: 8) = 1 > > > > > > > > our infra: > > ec2: c5.2xlarge (16gb mem, 8cores, 60gb gp2) > > dist: amazon linux 2 > > > > unbound-libs-1.6.6-1.amzn2.0.2.x86_64 > > unbound-python-1.6.6-1.amzn2.0.2.x86_64 > > unbound-1.6.6-1.amzn2.0.2.x86_64 > > > > conf: > > server: > > verbosity: 1 > > num-threads: 8 > > statistics-interval: 0 > > extended-statistics: yes > > statistics-cumulative: no > > msg-cache-slabs: 4 > > rrset-cache-slabs: 4 > > infra-cache-slabs: 4 > > key-cache-slabs: 4 > > rrset-cache-size: 100m > > msg-cache-size: 50m > > so-rcvbuf: 4m > > so-sndbuf: 4m > > so-reuseport: yes > > outgoing-range: 8192 > > num-queries-per-thread: 4096 > > do-daemonize: no > > prefetch: yes > > rrset-roundrobin: yes > > logfile: "" > > use-syslog: no > > directory: "/etc/unbound" > > chroot: "" > > log-queries: no > > access-control: 0.0.0.0/0 <http://0.0.0.0/0> allow > > interface: 0.0.0.0 > > interface-automatic: yes > > port: 53 > > do-ip4: yes > > do-ip6: no > > do-udp: yes > > do-tcp: yes > > username: "unbound" > > pidfile: "/var/run/unbound/unbound.pid" > > root-hints: /etc/unbound/root.hints > > key-cache-size: 32m > > local-zone: "10.in-addr.arpa." nodefault > > > > remote-control: > > control-enable: yes > > > > any ideas? > > >
