Re: [Dnsmasq-discuss] Dnsmasq stops caching for a while on receive of failed or retried lookup?
On 12/06/18 12:21, Mark Fermor, HolidayExtras.com wrote: > Hello, > > Running dnsmasq with these options: > /usr/sbin/dnsmasq -k --cache-size=50 --log-facility=- --user=nobody > --group=nobody --no-hosts --neg-ttl=60 --max-ttl=240 --max-cache-ttl=300 > > No local dnsmasq config file so that's literally all the config other > than defaults applied by dnsmasq > > dnsmasq -v > Dnsmasq version 2.78 Copyright (c) 2000-2017 Simon Kelley > Compile time options: IPv6 GNU-getopt no-DBus no-i18n no-IDN DHCP DHCPv6 > no-Lua TFTP no-conntrack ipset auth no-DNSSEC loop-detect inotify > > This is something running running in Kubernetes, they run as sidekick > containers to the main application container. I have multiple of the > same deployment running in the cluster, so they're all at the same > versions and receiving equal amounts of traffic via load balancing. They > all talk to the same endpoints outbound and do the same work load etc. > I've sent sigusr1 signal to all of the pods individually (all pods have > been running for approx 48 hours bar pod4 which has been running less > hours): > > pod1 > I0608 15:11:34.998127 1 nanny.go:116] dnsmasq[19]: time 1528470694 > I0608 15:11:34.998169 1 nanny.go:116] dnsmasq[19]: cache size 50, > 0/2267416 cache insertions re-used unexpired cache entries. > I0608 15:11:34.998175 1 nanny.go:116] dnsmasq[19]: queries > forwarded 3218560, queries answered locally 3182486 > I0608 15:11:34.998180 1 nanny.go:116] dnsmasq[19]: queries for > authoritative zones 0 > I0608 15:11:34.998184 1 nanny.go:116] dnsmasq[19]: server > 10.227.240.10#53: queries sent 3218560, retried or failed 16 > > pod2 > I0608 15:11:35.909168 1 nanny.go:116] dnsmasq[18]: time 1528470695 > I0608 15:11:35.909206 1 nanny.go:116] dnsmasq[18]: cache size 50, > 0/197465 cache insertions re-used unexpired cache entries. > I0608 15:11:35.909211 1 nanny.go:116] dnsmasq[18]: queries > forwarded 240843, queries answered locally 6159789 > I0608 15:11:35.909216 1 nanny.go:116] dnsmasq[18]: queries for > authoritative zones 0 > I0608 15:11:35.909219 1 nanny.go:116] dnsmasq[18]: server > 10.227.240.10#53: queries sent 240843, retried or failed 4 > > pod3 > I0608 15:11:36.948015 1 nanny.go:116] dnsmasq[20]: time 1528470696 > I0608 15:11:36.948083 1 nanny.go:116] dnsmasq[20]: cache size 50, > 0/63648 cache insertions re-used unexpired cache entries. > I0608 15:11:36.948138 1 nanny.go:116] dnsmasq[20]: queries > forwarded 46004, queries answered locally 6347223 > I0608 15:11:36.948188 1 nanny.go:116] dnsmasq[20]: queries for > authoritative zones 0 > I0608 15:11:36.948219 1 nanny.go:116] dnsmasq[20]: server > 10.227.240.10#53: queries sent 46004, retried or failed 1 > > pod4 > I0608 15:11:38.032330 1 nanny.go:116] dnsmasq[24]: time 1528470698 > I0608 15:11:38.032374 1 nanny.go:116] dnsmasq[24]: cache size 50, > 0/1359727 cache insertions re-used unexpired cache entries. > I0608 15:11:38.032382 1 nanny.go:116] dnsmasq[24]: queries > forwarded 1939395, queries answered locally 742411 > I0608 15:11:38.032388 1 nanny.go:116] dnsmasq[24]: queries for > authoritative zones 0 > I0608 15:11:38.032394 1 nanny.go:116] dnsmasq[24]: server > 10.227.240.10#53: queries sent 1939395, retried or failed 7 > > > The problem I notice, is some pods (pod1, pod2, pod4) are forwarding far > more requests than the other pods (an indication of what other is, would > be pod3). I'm not sure what's caused this seeing as the application is > doing the same across all pods. The only thing I notice here, is that > pods 1/2/4 all have a number of "retried or failed", which the other > pods don't. Therefore I wonder if the reason that those pods have sent > so many more requests upstream instead of hitting the cache, is because > of something to do with a "retried or failed", which then stops the > cache from working for a decent period of time? I've not been able to > find anything (google) relating to this scenario but it's the only thing > that makes sense to me right now. I can accept a couple of failures for > lookup here and there, but one failure (if i'm onto something that is), > seems to then cause no cache hits for a large period of time? > There's no reason why a retry would affect the cache, so I think you're jumping to conclusions there. The "retried or failed" counts are in the noise: don't forget that DNS uses UDP transport by default, so those numbers arise from very rare dropped UDP packets - nothing to worry about. As for the differences in cache hit rates - it could be subtle synchronisation effects, especialy as all the dnsmasq instances point to the same upstream. If one pod hits a particular DNS name first, then that name will end up with a longer TTL in that pod than the others, which get the record later from the upstream after some of the TTL has run down. If something makes that happen a lot, that would
[Dnsmasq-discuss] Dnsmasq stops caching for a while on receive of failed or retried lookup?
Hello, Running dnsmasq with these options: /usr/sbin/dnsmasq -k --cache-size=50 --log-facility=- --user=nobody --group=nobody --no-hosts --neg-ttl=60 --max-ttl=240 --max-cache-ttl=300 No local dnsmasq config file so that's literally all the config other than defaults applied by dnsmasq dnsmasq -v Dnsmasq version 2.78 Copyright (c) 2000-2017 Simon Kelley Compile time options: IPv6 GNU-getopt no-DBus no-i18n no-IDN DHCP DHCPv6 no-Lua TFTP no-conntrack ipset auth no-DNSSEC loop-detect inotify This is something running running in Kubernetes, they run as sidekick containers to the main application container. I have multiple of the same deployment running in the cluster, so they're all at the same versions and receiving equal amounts of traffic via load balancing. They all talk to the same endpoints outbound and do the same work load etc. I've sent sigusr1 signal to all of the pods individually (all pods have been running for approx 48 hours bar pod4 which has been running less hours): pod1 I0608 15:11:34.998127 1 nanny.go:116] dnsmasq[19]: time 1528470694 I0608 15:11:34.998169 1 nanny.go:116] dnsmasq[19]: cache size 50, 0/2267416 cache insertions re-used unexpired cache entries. I0608 15:11:34.998175 1 nanny.go:116] dnsmasq[19]: queries forwarded 3218560, queries answered locally 3182486 I0608 15:11:34.998180 1 nanny.go:116] dnsmasq[19]: queries for authoritative zones 0 I0608 15:11:34.998184 1 nanny.go:116] dnsmasq[19]: server 10.227.240.10#53: queries sent 3218560, retried or failed 16 pod2 I0608 15:11:35.909168 1 nanny.go:116] dnsmasq[18]: time 1528470695 I0608 15:11:35.909206 1 nanny.go:116] dnsmasq[18]: cache size 50, 0/197465 cache insertions re-used unexpired cache entries. I0608 15:11:35.909211 1 nanny.go:116] dnsmasq[18]: queries forwarded 240843, queries answered locally 6159789 I0608 15:11:35.909216 1 nanny.go:116] dnsmasq[18]: queries for authoritative zones 0 I0608 15:11:35.909219 1 nanny.go:116] dnsmasq[18]: server 10.227.240.10#53: queries sent 240843, retried or failed 4 pod3 I0608 15:11:36.948015 1 nanny.go:116] dnsmasq[20]: time 1528470696 I0608 15:11:36.948083 1 nanny.go:116] dnsmasq[20]: cache size 50, 0/63648 cache insertions re-used unexpired cache entries. I0608 15:11:36.948138 1 nanny.go:116] dnsmasq[20]: queries forwarded 46004, queries answered locally 6347223 I0608 15:11:36.948188 1 nanny.go:116] dnsmasq[20]: queries for authoritative zones 0 I0608 15:11:36.948219 1 nanny.go:116] dnsmasq[20]: server 10.227.240.10#53: queries sent 46004, retried or failed 1 pod4 I0608 15:11:38.032330 1 nanny.go:116] dnsmasq[24]: time 1528470698 I0608 15:11:38.032374 1 nanny.go:116] dnsmasq[24]: cache size 50, 0/1359727 cache insertions re-used unexpired cache entries. I0608 15:11:38.032382 1 nanny.go:116] dnsmasq[24]: queries forwarded 1939395, queries answered locally 742411 I0608 15:11:38.032388 1 nanny.go:116] dnsmasq[24]: queries for authoritative zones 0 I0608 15:11:38.032394 1 nanny.go:116] dnsmasq[24]: server 10.227.240.10#53: queries sent 1939395, retried or failed 7 The problem I notice, is some pods (pod1, pod2, pod4) are forwarding far more requests than the other pods (an indication of what other is, would be pod3). I'm not sure what's caused this seeing as the application is doing the same across all pods. The only thing I notice here, is that pods 1/2/4 all have a number of "retried or failed", which the other pods don't. Therefore I wonder if the reason that those pods have sent so many more requests upstream instead of hitting the cache, is because of something to do with a "retried or failed", which then stops the cache from working for a decent period of time? I've not been able to find anything (google) relating to this scenario but it's the only thing that makes sense to me right now. I can accept a couple of failures for lookup here and there, but one failure (if i'm onto something that is), seems to then cause no cache hits for a large period of time? Thanks and Best, Mark ___ Dnsmasq-discuss mailing list Dnsmasq-discuss@lists.thekelleys.org.uk http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss
[Dnsmasq-discuss] Dnsmasq stops caching for a while on receive of failed or retried lookup?
Hello, Running dnsmasq with these options: /usr/sbin/dnsmasq -k --cache-size=50 --log-facility=- --user=nobody --group=nobody --no-hosts --neg-ttl=60 --max-ttl=240 --max-cache-ttl=300 No local dnsmasq config file so that's literally all the config other than defaults applied by dnsmasq dnsmasq -v Dnsmasq version 2.78 Copyright (c) 2000-2017 Simon Kelley Compile time options: IPv6 GNU-getopt no-DBus no-i18n no-IDN DHCP DHCPv6 no-Lua TFTP no-conntrack ipset auth no-DNSSEC loop-detect inotify This is something running running in Kubernetes, they run as sidekick containers to the main application container. I have 5x of the same deployment running in the cluster, so they're all at the same versions and receiving equal amounts of traffic via load balancing. They all talk to the same endpoints outbound and do the same work load etc. I've sent sigusr1 signal to all of the 5x pods individually with a 60 second interval between sending the signal and this was the output (all pods have been running for approx 48 hours bar pod4 which has been running less hours): pod1 I0608 15:10:30.222091 1 nanny.go:116] dnsmasq[19]: time 1528470630 I0608 15:10:30.222843 1 nanny.go:116] dnsmasq[19]: cache size 50, 0/2267390 cache insertions re-used unexpired cache entries. I0608 15:10:30.222862 1 nanny.go:116] dnsmasq[19]: queries forwarded 3218532, queries answered locally 3179484 I0608 15:10:30.222868 1 nanny.go:116] dnsmasq[19]: queries for authoritative zones 0 I0608 15:10:30.222874 1 nanny.go:116] dnsmasq[19]: server 10.227.240.10#53: queries sent 3218532, retried or failed 16 I0608 15:11:34.998127 1 nanny.go:116] dnsmasq[19]: time 1528470694 I0608 15:11:34.998169 1 nanny.go:116] dnsmasq[19]: cache size 50, 0/2267416 cache insertions re-used unexpired cache entries. I0608 15:11:34.998175 1 nanny.go:116] dnsmasq[19]: queries forwarded 3218560, queries answered locally 3182486 I0608 15:11:34.998180 1 nanny.go:116] dnsmasq[19]: queries for authoritative zones 0 I0608 15:11:34.998184 1 nanny.go:116] dnsmasq[19]: server 10.227.240.10#53: queries sent 3218560, retried or failed 16 pod2 I0608 15:10:30.888610 1 nanny.go:116] dnsmasq[18]: time 1528470630 I0608 15:10:30.888674 1 nanny.go:116] dnsmasq[18]: cache size 50, 0/197436 cache insertions re-used unexpired cache entries. I0608 15:10:30.888705 1 nanny.go:116] dnsmasq[18]: queries forwarded 240814, queries answered locally 6156846 I0608 15:10:30.888739 1 nanny.go:116] dnsmasq[18]: queries for authoritative zones 0 I0608 15:10:30.888768 1 nanny.go:116] dnsmasq[18]: server 10.227.240.10#53: queries sent 240814, retried or failed 4 I0608 15:11:35.909168 1 nanny.go:116] dnsmasq[18]: time 1528470695 I0608 15:11:35.909206 1 nanny.go:116] dnsmasq[18]: cache size 50, 0/197465 cache insertions re-used unexpired cache entries. I0608 15:11:35.909211 1 nanny.go:116] dnsmasq[18]: queries forwarded 240843, queries answered locally 6159789 I0608 15:11:35.909216 1 nanny.go:116] dnsmasq[18]: queries for authoritative zones 0 I0608 15:11:35.909219 1 nanny.go:116] dnsmasq[18]: server 10.227.240.10#53: queries sent 240843, retried or failed 4 pod3 I0608 15:10:31.492089 1 nanny.go:116] dnsmasq[20]: time 1528470631 I0608 15:10:31.492127 1 nanny.go:116] dnsmasq[20]: cache size 50, 0/63615 cache insertions re-used unexpired cache entries. I0608 15:10:31.492132 1 nanny.go:116] dnsmasq[20]: queries forwarded 45974, queries answered locally 6344157 I0608 15:10:31.492137 1 nanny.go:116] dnsmasq[20]: queries for authoritative zones 0 I0608 15:10:31.492143 1 nanny.go:116] dnsmasq[20]: server 10.227.240.10#53: queries sent 45974, retried or failed 1 I0608 15:11:36.948015 1 nanny.go:116] dnsmasq[20]: time 1528470696 I0608 15:11:36.948083 1 nanny.go:116] dnsmasq[20]: cache size 50, 0/63648 cache insertions re-used unexpired cache entries. I0608 15:11:36.948138 1 nanny.go:116] dnsmasq[20]: queries forwarded 46004, queries answered locally 6347223 I0608 15:11:36.948188 1 nanny.go:116] dnsmasq[20]: queries for authoritative zones 0 I0608 15:11:36.948219 1 nanny.go:116] dnsmasq[20]: server 10.227.240.10#53: queries sent 46004, retried or failed 1 pod4 I0608 15:10:32.290488 1 nanny.go:116] dnsmasq[24]: time 1528470632 I0608 15:10:32.290564 1 nanny.go:116] dnsmasq[24]: cache size 50, 0/1358169 cache insertions re-used unexpired cache entries. I0608 15:10:32.290597 1 nanny.go:116] dnsmasq[24]: queries forwarded 1937179, queries answered locally 741609 I0608 15:10:32.290629 1 nanny.go:116] dnsmasq[24]: queries for authoritative zones 0 I0608 15:10:32.290652 1 nanny.go:116] dnsmasq[24]: server 10.227.240.10#53: queries sent 1937179, retried or failed 7 I0608 15:11:38.032330 1 nanny.go:116] dnsmasq[24]: time 1528470698 I0608 15:11:38.032374 1