Re: [Dnsmasq-discuss] Dnsmasq stops caching for a while on receive of failed or retried lookup?

2018-06-12 Thread Simon Kelley
On 12/06/18 12:21, Mark Fermor, HolidayExtras.com wrote:
> Hello,
> 
> Running dnsmasq with these options:
> /usr/sbin/dnsmasq -k --cache-size=50 --log-facility=- --user=nobody
> --group=nobody --no-hosts --neg-ttl=60 --max-ttl=240 --max-cache-ttl=300
> 
> No local dnsmasq config file so that's literally all the config other
> than defaults applied by dnsmasq
> 
> dnsmasq -v
> Dnsmasq version 2.78  Copyright (c) 2000-2017 Simon Kelley
> Compile time options: IPv6 GNU-getopt no-DBus no-i18n no-IDN DHCP DHCPv6
> no-Lua TFTP no-conntrack ipset auth no-DNSSEC loop-detect inotify
> 
> This is something running running in Kubernetes, they run as sidekick
> containers to the main application container. I have multiple of the
> same deployment running in the cluster, so they're all at the same
> versions and receiving equal amounts of traffic via load balancing. They
> all talk to the same endpoints outbound and do the same work load etc.
> I've sent sigusr1 signal to all of the pods individually (all pods have
> been running for approx 48 hours bar pod4 which has been running less
> hours):
> 
> pod1
> I0608 15:11:34.998127       1 nanny.go:116] dnsmasq[19]: time 1528470694
> I0608 15:11:34.998169       1 nanny.go:116] dnsmasq[19]: cache size 50,
> 0/2267416 cache insertions re-used unexpired cache entries.
> I0608 15:11:34.998175       1 nanny.go:116] dnsmasq[19]: queries
> forwarded 3218560, queries answered locally 3182486
> I0608 15:11:34.998180       1 nanny.go:116] dnsmasq[19]: queries for
> authoritative zones 0
> I0608 15:11:34.998184       1 nanny.go:116] dnsmasq[19]: server
> 10.227.240.10#53: queries sent 3218560, retried or failed 16
> 
> pod2
> I0608 15:11:35.909168       1 nanny.go:116] dnsmasq[18]: time 1528470695
> I0608 15:11:35.909206       1 nanny.go:116] dnsmasq[18]: cache size 50,
> 0/197465 cache insertions re-used unexpired cache entries.
> I0608 15:11:35.909211       1 nanny.go:116] dnsmasq[18]: queries
> forwarded 240843, queries answered locally 6159789
> I0608 15:11:35.909216       1 nanny.go:116] dnsmasq[18]: queries for
> authoritative zones 0
> I0608 15:11:35.909219       1 nanny.go:116] dnsmasq[18]: server
> 10.227.240.10#53: queries sent 240843, retried or failed 4
> 
> pod3
> I0608 15:11:36.948015       1 nanny.go:116] dnsmasq[20]: time 1528470696
> I0608 15:11:36.948083       1 nanny.go:116] dnsmasq[20]: cache size 50,
> 0/63648 cache insertions re-used unexpired cache entries.
> I0608 15:11:36.948138       1 nanny.go:116] dnsmasq[20]: queries
> forwarded 46004, queries answered locally 6347223
> I0608 15:11:36.948188       1 nanny.go:116] dnsmasq[20]: queries for
> authoritative zones 0
> I0608 15:11:36.948219       1 nanny.go:116] dnsmasq[20]: server
> 10.227.240.10#53: queries sent 46004, retried or failed 1
> 
> pod4
> I0608 15:11:38.032330       1 nanny.go:116] dnsmasq[24]: time 1528470698
> I0608 15:11:38.032374       1 nanny.go:116] dnsmasq[24]: cache size 50,
> 0/1359727 cache insertions re-used unexpired cache entries.
> I0608 15:11:38.032382       1 nanny.go:116] dnsmasq[24]: queries
> forwarded 1939395, queries answered locally 742411
> I0608 15:11:38.032388       1 nanny.go:116] dnsmasq[24]: queries for
> authoritative zones 0
> I0608 15:11:38.032394       1 nanny.go:116] dnsmasq[24]: server
> 10.227.240.10#53: queries sent 1939395, retried or failed 7
> 
> 
> The problem I notice, is some pods (pod1, pod2, pod4) are forwarding far
> more requests than the other pods (an indication of what other is, would
> be pod3). I'm not sure what's caused this seeing as the application is
> doing the same across all pods. The only thing I notice here, is that
> pods 1/2/4 all have a number of "retried or failed", which the other
> pods don't. Therefore I wonder if the reason that those pods have sent
> so many more requests upstream instead of hitting the cache, is because
> of something to do with a "retried or failed", which then stops the
> cache from working for a decent period of time? I've not been able to
> find anything (google) relating to this scenario but it's the only thing
> that makes sense to me right now. I can accept a couple of failures for
> lookup here and there, but one failure (if i'm onto something that is),
> seems to then cause no cache hits for a large period of time?
> 


There's no reason why a retry would affect the cache, so I think you're
jumping to conclusions there. The "retried or failed" counts are in the
noise: don't forget that DNS uses UDP transport by default, so those
numbers arise from very rare dropped UDP packets - nothing to worry about.


As for the differences in cache hit rates - it could be subtle
synchronisation effects, especialy as all the dnsmasq instances point to
the same upstream. If one pod hits a particular DNS name first, then
that name will end up with a longer TTL in that pod than the others,
which get the record later from the upstream after some of the TTL has
run down. If something makes that happen a lot, that would 

[Dnsmasq-discuss] Dnsmasq stops caching for a while on receive of failed or retried lookup?

2018-06-12 Thread Mark Fermor, HolidayExtras.com
Hello,

Running dnsmasq with these options:
/usr/sbin/dnsmasq -k --cache-size=50 --log-facility=- --user=nobody
--group=nobody --no-hosts --neg-ttl=60 --max-ttl=240 --max-cache-ttl=300

No local dnsmasq config file so that's literally all the config other than
defaults applied by dnsmasq

dnsmasq -v
Dnsmasq version 2.78  Copyright (c) 2000-2017 Simon Kelley
Compile time options: IPv6 GNU-getopt no-DBus no-i18n no-IDN DHCP DHCPv6
no-Lua TFTP no-conntrack ipset auth no-DNSSEC loop-detect inotify

This is something running running in Kubernetes, they run as sidekick
containers to the main application container. I have multiple of the same
deployment running in the cluster, so they're all at the same versions and
receiving equal amounts of traffic via load balancing. They all talk to the
same endpoints outbound and do the same work load etc. I've sent sigusr1
signal to all of the pods individually (all pods have been running for
approx 48 hours bar pod4 which has been running less hours):

pod1
I0608 15:11:34.998127   1 nanny.go:116] dnsmasq[19]: time 1528470694
I0608 15:11:34.998169   1 nanny.go:116] dnsmasq[19]: cache size 50,
0/2267416 cache insertions re-used unexpired cache entries.
I0608 15:11:34.998175   1 nanny.go:116] dnsmasq[19]: queries forwarded
3218560, queries answered locally 3182486
I0608 15:11:34.998180   1 nanny.go:116] dnsmasq[19]: queries for
authoritative zones 0
I0608 15:11:34.998184   1 nanny.go:116] dnsmasq[19]: server
10.227.240.10#53: queries sent 3218560, retried or failed 16

pod2
I0608 15:11:35.909168   1 nanny.go:116] dnsmasq[18]: time 1528470695
I0608 15:11:35.909206   1 nanny.go:116] dnsmasq[18]: cache size 50,
0/197465 cache insertions re-used unexpired cache entries.
I0608 15:11:35.909211   1 nanny.go:116] dnsmasq[18]: queries forwarded
240843, queries answered locally 6159789
I0608 15:11:35.909216   1 nanny.go:116] dnsmasq[18]: queries for
authoritative zones 0
I0608 15:11:35.909219   1 nanny.go:116] dnsmasq[18]: server
10.227.240.10#53: queries sent 240843, retried or failed 4

pod3
I0608 15:11:36.948015   1 nanny.go:116] dnsmasq[20]: time 1528470696
I0608 15:11:36.948083   1 nanny.go:116] dnsmasq[20]: cache size 50,
0/63648 cache insertions re-used unexpired cache entries.
I0608 15:11:36.948138   1 nanny.go:116] dnsmasq[20]: queries forwarded
46004, queries answered locally 6347223
I0608 15:11:36.948188   1 nanny.go:116] dnsmasq[20]: queries for
authoritative zones 0
I0608 15:11:36.948219   1 nanny.go:116] dnsmasq[20]: server
10.227.240.10#53: queries sent 46004, retried or failed 1

pod4
I0608 15:11:38.032330   1 nanny.go:116] dnsmasq[24]: time 1528470698
I0608 15:11:38.032374   1 nanny.go:116] dnsmasq[24]: cache size 50,
0/1359727 cache insertions re-used unexpired cache entries.
I0608 15:11:38.032382   1 nanny.go:116] dnsmasq[24]: queries forwarded
1939395, queries answered locally 742411
I0608 15:11:38.032388   1 nanny.go:116] dnsmasq[24]: queries for
authoritative zones 0
I0608 15:11:38.032394   1 nanny.go:116] dnsmasq[24]: server
10.227.240.10#53: queries sent 1939395, retried or failed 7


The problem I notice, is some pods (pod1, pod2, pod4) are forwarding far
more requests than the other pods (an indication of what other is, would be
pod3). I'm not sure what's caused this seeing as the application is doing
the same across all pods. The only thing I notice here, is that pods 1/2/4
all have a number of "retried or failed", which the other pods don't.
Therefore I wonder if the reason that those pods have sent so many more
requests upstream instead of hitting the cache, is because of something to
do with a "retried or failed", which then stops the cache from working for
a decent period of time? I've not been able to find anything (google)
relating to this scenario but it's the only thing that makes sense to me
right now. I can accept a couple of failures for lookup here and there, but
one failure (if i'm onto something that is), seems to then cause no cache
hits for a large period of time?

Thanks and Best,
Mark
___
Dnsmasq-discuss mailing list
Dnsmasq-discuss@lists.thekelleys.org.uk
http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss


[Dnsmasq-discuss] Dnsmasq stops caching for a while on receive of failed or retried lookup?

2018-06-08 Thread Mark Fermor, HolidayExtras.com
Hello,

Running dnsmasq with these options:

/usr/sbin/dnsmasq -k --cache-size=50 --log-facility=- --user=nobody
--group=nobody --no-hosts --neg-ttl=60 --max-ttl=240 --max-cache-ttl=300

No local dnsmasq config file so that's literally all the config other than
defaults applied by dnsmasq

dnsmasq -v

Dnsmasq version 2.78  Copyright (c) 2000-2017 Simon Kelley

Compile time options: IPv6 GNU-getopt no-DBus no-i18n no-IDN DHCP DHCPv6
no-Lua TFTP no-conntrack ipset auth no-DNSSEC loop-detect inotify


This is something running running in Kubernetes, they run as sidekick
containers to the main application container. I have 5x of the same
deployment running in the cluster, so they're all at the same versions and
receiving equal amounts of traffic via load balancing. They all talk to the
same endpoints outbound and do the same work load etc. I've sent sigusr1
signal to all of the 5x pods individually with a 60 second interval between
sending the signal and this was the output (all pods have been running for
approx 48 hours bar pod4 which has been running less hours):

pod1

I0608 15:10:30.222091   1 nanny.go:116] dnsmasq[19]: time 1528470630

I0608 15:10:30.222843   1 nanny.go:116] dnsmasq[19]: cache size 50,
0/2267390 cache insertions re-used unexpired cache entries.

I0608 15:10:30.222862   1 nanny.go:116] dnsmasq[19]: queries forwarded
3218532, queries answered locally 3179484

I0608 15:10:30.222868   1 nanny.go:116] dnsmasq[19]: queries for
authoritative zones 0

I0608 15:10:30.222874   1 nanny.go:116] dnsmasq[19]: server
10.227.240.10#53: queries sent 3218532, retried or failed 16

I0608 15:11:34.998127   1 nanny.go:116] dnsmasq[19]: time 1528470694

I0608 15:11:34.998169   1 nanny.go:116] dnsmasq[19]: cache size 50,
0/2267416 cache insertions re-used unexpired cache entries.

I0608 15:11:34.998175   1 nanny.go:116] dnsmasq[19]: queries forwarded
3218560, queries answered locally 3182486

I0608 15:11:34.998180   1 nanny.go:116] dnsmasq[19]: queries for
authoritative zones 0

I0608 15:11:34.998184   1 nanny.go:116] dnsmasq[19]: server
10.227.240.10#53: queries sent 3218560, retried or failed 16


pod2

I0608 15:10:30.888610   1 nanny.go:116] dnsmasq[18]: time 1528470630

I0608 15:10:30.888674   1 nanny.go:116] dnsmasq[18]: cache size 50,
0/197436 cache insertions re-used unexpired cache entries.

I0608 15:10:30.888705   1 nanny.go:116] dnsmasq[18]: queries forwarded
240814, queries answered locally 6156846

I0608 15:10:30.888739   1 nanny.go:116] dnsmasq[18]: queries for
authoritative zones 0

I0608 15:10:30.888768   1 nanny.go:116] dnsmasq[18]: server
10.227.240.10#53: queries sent 240814, retried or failed 4

I0608 15:11:35.909168   1 nanny.go:116] dnsmasq[18]: time 1528470695

I0608 15:11:35.909206   1 nanny.go:116] dnsmasq[18]: cache size 50,
0/197465 cache insertions re-used unexpired cache entries.

I0608 15:11:35.909211   1 nanny.go:116] dnsmasq[18]: queries forwarded
240843, queries answered locally 6159789

I0608 15:11:35.909216   1 nanny.go:116] dnsmasq[18]: queries for
authoritative zones 0

I0608 15:11:35.909219   1 nanny.go:116] dnsmasq[18]: server
10.227.240.10#53: queries sent 240843, retried or failed 4


pod3

I0608 15:10:31.492089   1 nanny.go:116] dnsmasq[20]: time 1528470631

I0608 15:10:31.492127   1 nanny.go:116] dnsmasq[20]: cache size 50,
0/63615 cache insertions re-used unexpired cache entries.

I0608 15:10:31.492132   1 nanny.go:116] dnsmasq[20]: queries forwarded
45974, queries answered locally 6344157

I0608 15:10:31.492137   1 nanny.go:116] dnsmasq[20]: queries for
authoritative zones 0

I0608 15:10:31.492143   1 nanny.go:116] dnsmasq[20]: server
10.227.240.10#53: queries sent 45974, retried or failed 1

I0608 15:11:36.948015   1 nanny.go:116] dnsmasq[20]: time 1528470696

I0608 15:11:36.948083   1 nanny.go:116] dnsmasq[20]: cache size 50,
0/63648 cache insertions re-used unexpired cache entries.

I0608 15:11:36.948138   1 nanny.go:116] dnsmasq[20]: queries forwarded
46004, queries answered locally 6347223

I0608 15:11:36.948188   1 nanny.go:116] dnsmasq[20]: queries for
authoritative zones 0

I0608 15:11:36.948219   1 nanny.go:116] dnsmasq[20]: server
10.227.240.10#53: queries sent 46004, retried or failed 1


pod4

I0608 15:10:32.290488   1 nanny.go:116] dnsmasq[24]: time 1528470632

I0608 15:10:32.290564   1 nanny.go:116] dnsmasq[24]: cache size 50,
0/1358169 cache insertions re-used unexpired cache entries.

I0608 15:10:32.290597   1 nanny.go:116] dnsmasq[24]: queries forwarded
1937179, queries answered locally 741609

I0608 15:10:32.290629   1 nanny.go:116] dnsmasq[24]: queries for
authoritative zones 0

I0608 15:10:32.290652   1 nanny.go:116] dnsmasq[24]: server
10.227.240.10#53: queries sent 1937179, retried or failed 7

I0608 15:11:38.032330   1 nanny.go:116] dnsmasq[24]: time 1528470698

I0608 15:11:38.032374   1