Re: [Dnsmasq-discuss] Odd caching behaviour...

2019-04-04 Thread John Robson
Ok, thanks - that makes sense in terms of the 'incomplete' entry being
cached.

I might set up a couple of dns servers to simulate this at some point - I'm
going to want a reproducible setup for our own testing as well...  If I can
then I'll come back with that log...

Actually, maybe I already have it, let me check...
Nope - I have that up to the dnsmasq restart, not the software restart.

Cheers,

John

On Thu, 4 Apr 2019 at 16:27, Simon Kelley  wrote:

> On 30/03/2019 08:41, John Robson wrote:
> > Simon,
> >
> > The upstream server is authoritative for the initial domain (being
> > inside an organisation I don’t think that’s unusual) and the incomplete
> > (but perfectly valid, I agree) response is taken as complete. The
> > upstream server does do recursion as well, but when that failed it just
> > returned what it could (seems reasonable enough).
> >
> > I’d have thought that the lack of an actual resolved A record (which is
> > what was asked for) would mark the cache entry as incomplete at best.
> > This is pure gut, not a technically based statement.
>
> A CNAME reply with no record for the target of the CNAME, from a
> recursive server, establishes that the target doesn't exist. If it were
> otherwise, there would be large numbers of legitimate answers which are
> uncachable. Consider that there are many record types and the target of
> a CNAME will not exist for most record types.
>
> As a common example, an IPv6 enabled host will query for the  record
> of something it wants to talk to. If hostname is a CNAME, and the thing
> it want's to talk to doesn't have an  record, then the reply will be
> a CNAME with no target. You really want to be able to cache that.
>
>
> >
> > And whilst I agree that the record was cached (and that that is probably
> > technically correct) I can’t then explain why dnsmasq stopped using the
> > cache when I restarted my program - with 45+ minutes of cache left,
> > dnsmasq went back to the upstream server and got a complete answer.
> >
> > Restarting dnsmasq obviously reset the cache, and everything recovered
> > when I did that - but restarting other software shouldn’t have magically
> > reset the cache, and yet it did.
>
>
> I can't explain that. If it's reproducible, run dnsmasq with
> --log-queries set and see exactly what's going on.
>
>
> >
> > (Un)Fortunately the second/third nameservers seem to be being better
> > behaved at the moment, so we haven’t seen the incomplete response in
> > several days - kind of makes it harder to test though.
>
> Not reproducible, then. That's a pity.
>
>
> Cheers,
>
> Simon.
>
> >
> > Cheers,
> >
> > John
> >
> >
> >
> > On Fri, 29 Mar 2019 at 22:43, Simon Kelley  > <mailto:si...@thekelleys.org.uk>> wrote:
> >
> > On 21/03/2019 11:01, John Robson wrote:
> > > OK,
> > >
> > > Maybe this does reveal something about the caching...
> > > Which might be expected behaviour, but I am not convinced it's
> > useful...
> > >
> > > Overnight monitoring has shown that the upstream server does
> > > occasionally send back an incomplete (but perfectly valid) CNAME
> only
> > > response.  Mostly I can justify the caching behaviour based on the
> > TTLs
> > > of the second CNAME or A record (the server is authoritative for
> the
> > > first CNAME, so that's always at 3600).
> > >
> > > As a slight aside:
> > > dnsmasq sends a query at 22:57:32.599, then again (new transaction
> id)
> > > at 22:57:33.601, and at 22:57:36.601.
> > > This last query gets a response in 0.1 seconds, both the others
> > > eventually come in (incomplete) at 22:57:44.073
> > > I am assuming that dnsmasq ignored these late arrivals (either due
> > to a
> > > default timeout, or just because a better answer has been received
> -
> > > this would be comparable with behaviour when it queries multiple
> > servers
> > > to decide which is 'best').
> > > In this case we are protected by the fact that the incomplete query
> > > takes far longer than the complete one due to timeouts.
> > >
> > > Later though:
> > > At 01:12:47 we are out of TTL, so send a request, and get an
> > incomplete
> > > response... The response only contains the first CNAME, which has
> > a 3600
> > > TTL.
> > >
> > > Then dnsmasq

Re: [Dnsmasq-discuss] Netboot drops DNSMasq DHCP offer

2019-04-04 Thread John Robson
A couple of packet captures might help you (and us) see what is being sent
differently.

On Wed, 3 Apr 2019 at 20:32, Conrad Kostecki  wrote:

> Hi,
> in order to make PXE possible with older notebooks, I've compiled for
> myself Netboot.
> This is a piece of software, which starts from floppy, where you can load
> your dos paket driver and start PXE.
> Basically, it makes possible to boot with PXE by using PCMCIA networks
> cards from floppy, when nothing else is possible to boot from.
>
> Netboot itself is working so far fine, it initializes itself fine, loads
> my dos packet driver and starts DHCP.
> I can clearly see, that a DHCP broadcast comes into my DNSMasq, which
> replies with a DHCP offer.
> And here it stops. It seems, Netboot can't correctly handle that offer by
> DNSMasq, as it silently drops it and tries again.
> So I see multiple broadcast searches and DHCP offers.
>
> BUT: If I use DHCP from an ordinary AVM Fritz!Box 7490 (Router), Netboot
> succeeds and can handle the reply from it.
> So the question is, what could go wrong? Can I debug this somehow? Any
> solutions to make this possible work with Netboot?
>
> Note: Netboot is pretty old, latest release is from 2007. I suspect, that
> maybe DNSMasq does some RFC correct, which is "too new" for those old
> clients.
> Any help would be appreciated.
>
> Conrad
> ___
> Dnsmasq-discuss mailing list
> Dnsmasq-discuss@lists.thekelleys.org.uk
> http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss
>


-- 

*John Robson Sr. Customer Support Engineer**, Zenoss
<https://www.zenoss.com/>*
jrob...@zenoss.com | *O:*

<https://www.zenoss.com/resources/gartner-market-guide-it-infrastructure-monitoring-tools>
___
Dnsmasq-discuss mailing list
Dnsmasq-discuss@lists.thekelleys.org.uk
http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss


Re: [Dnsmasq-discuss] Odd caching behaviour...

2019-03-30 Thread John Robson
Simon,

The upstream server is authoritative for the initial domain (being inside
an organisation I don’t think that’s unusual) and the incomplete (but
perfectly valid, I agree) response is taken as complete. The upstream
server does do recursion as well, but when that failed it just returned
what it could (seems reasonable enough).

I’d have thought that the lack of an actual resolved A record (which is
what was asked for) would mark the cache entry as incomplete at best.
This is pure gut, not a technically based statement.

And whilst I agree that the record was cached (and that that is probably
technically correct) I can’t then explain why dnsmasq stopped using the
cache when I restarted my program - with 45+ minutes of cache left, dnsmasq
went back to the upstream server and got a complete answer.

Restarting dnsmasq obviously reset the cache, and everything recovered when
I did that - but restarting other software shouldn’t have magically reset
the cache, and yet it did.

(Un)Fortunately the second/third nameservers seem to be being better
behaved at the moment, so we haven’t seen the incomplete response in
several days - kind of makes it harder to test though.

Cheers,

John



On Fri, 29 Mar 2019 at 22:43, Simon Kelley  wrote:

> On 21/03/2019 11:01, John Robson wrote:
> > OK,
> >
> > Maybe this does reveal something about the caching...
> > Which might be expected behaviour, but I am not convinced it's useful...
> >
> > Overnight monitoring has shown that the upstream server does
> > occasionally send back an incomplete (but perfectly valid) CNAME only
> > response.  Mostly I can justify the caching behaviour based on the TTLs
> > of the second CNAME or A record (the server is authoritative for the
> > first CNAME, so that's always at 3600).
> >
> > As a slight aside:
> > dnsmasq sends a query at 22:57:32.599, then again (new transaction id)
> > at 22:57:33.601, and at 22:57:36.601.
> > This last query gets a response in 0.1 seconds, both the others
> > eventually come in (incomplete) at 22:57:44.073
> > I am assuming that dnsmasq ignored these late arrivals (either due to a
> > default timeout, or just because a better answer has been received -
> > this would be comparable with behaviour when it queries multiple servers
> > to decide which is 'best').
> > In this case we are protected by the fact that the incomplete query
> > takes far longer than the complete one due to timeouts.
> >
> > Later though:
> > At 01:12:47 we are out of TTL, so send a request, and get an incomplete
> > response... The response only contains the first CNAME, which has a 3600
> > TTL.
> >
> > Then dnsmasq doesn't send another query for an hour - despite the fact
> > that it doesn't have a "good" answer.
> > In this case the query it sends after an hour gets incomplete response
> > again - not good.
> > Then I lost track because the container got moved to a different host -
> > but it looks like it was returning incomplete for several hours...
> >
> >
> > dnsmasq is otherwise well behaved - it is still responding to other
> > queries just fine, despite being hammered by more than 2k queries/second
> >
> > Two questions:
> >  - Is it correct/wanted behaviour to cache an incomplete record like
> this?
> > I have no issue caching the cname, but should we keep trying to resolve
> > the cname to an a record?
> >
> >  - Why/How does a restart of the querying program change the caching
> > behaviour of dnsmasq?
> > Because even if the program is restarted after just a few minutes it
> > immediately gets better data - my capture from yesterday shows that
> > despite the fact that the TTL had 2855 seconds (of the 3600 default)
> > left just two minutes before the first 'new process' request comes in,
> > that new request triggers an outbound query.
> >
> >
> > Cheers,
> >
> > John
> >
>
> What's you're calling an "incomplete" answer is actually a perfectly
> good answer. Dnsmasq is entitled to infer that the target of the CNAME
> doesn't exist if it's not included in the answer, and keep that
> information in the cache for the the TTL  period.
>
> Note that is _only_ true if the the upstream server is a recursive
> server - as such it's expected to attempt the follow the CNAME and
> return as much of the chain as exists. If the upstream server is an
> authoritative server, that's not true - if the CNAME target is outside
> the domain(s) that the server is authoritative for, then the target will
> not be included. This is one reason why dnsmasq should only use
> recursive servers, an it will log an error if an upstream server

Re: [Dnsmasq-discuss] The order of nameservers provided by `server=`

2019-03-25 Thread John Robson
Does that not only apply to those in /etc/resolv.conf (or the overridden
file)

On Mon, 25 Mar 2019 at 17:17, Fox Haxx  wrote:

> > Don’t think dnsmasq cares what order they are in, it tests them all and
> chooses the fastest to use.
>
> By default, true, however, note I used the `strict-order` option.
>
___
Dnsmasq-discuss mailing list
Dnsmasq-discuss@lists.thekelleys.org.uk
http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss


Re: [Dnsmasq-discuss] The order of nameservers provided by `server=`

2019-03-25 Thread John Robson
Don’t think dnsmasq cares what order they are in, it tests them all and
chooses the fastest to use.

On Mon, 25 Mar 2019 at 16:00, Fox Haxx  wrote:

> Let' say I have this config:
>
> server=
> server=
> resolv-file=/etc/dnsmasq-resolv.conf
> strict-order
>
> By running dnsmasq with `--log-queries` I discovered that  is actually
> listed above :
>
> dnsmasq: using nameserver 
> dnsmasq: using nameserver 
> dnsmasq: using nameserver 192.168.1.1#53
> dnsmasq: read /etc/hosts - 2 addresses
> dnsmasq: query[A] x.org from 127.0.0.1
> dnsmasq: forwarded x.org to 
> dnsmasq: reply x.org is 131.252.210.176
>
> Which is to say, every `server=` server is _prepended_ to servers list,
> rather
> than _appended_.
>
> I think this is counter-intuitive.
>
> So is this intended?
> Is it documented?
> Should it be changed?
>
>
> ___
> Dnsmasq-discuss mailing list
> Dnsmasq-discuss@lists.thekelleys.org.uk
> http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss
>
___
Dnsmasq-discuss mailing list
Dnsmasq-discuss@lists.thekelleys.org.uk
http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss


Re: [Dnsmasq-discuss] Odd caching behaviour...

2019-03-21 Thread John Robson
OK,

Maybe this does reveal something about the caching...
Which might be expected behaviour, but I am not convinced it's useful...

Overnight monitoring has shown that the upstream server does occasionally
send back an incomplete (but perfectly valid) CNAME only response.  Mostly
I can justify the caching behaviour based on the TTLs of the second CNAME
or A record (the server is authoritative for the first CNAME, so that's
always at 3600).

As a slight aside:
dnsmasq sends a query at 22:57:32.599, then again (new transaction id) at
22:57:33.601, and at 22:57:36.601.
This last query gets a response in 0.1 seconds, both the others eventually
come in (incomplete) at 22:57:44.073
I am assuming that dnsmasq ignored these late arrivals (either due to a
default timeout, or just because a better answer has been received - this
would be comparable with behaviour when it queries multiple servers to
decide which is 'best').
In this case we are protected by the fact that the incomplete query takes
far longer than the complete one due to timeouts.

Later though:
At 01:12:47 we are out of TTL, so send a request, and get an incomplete
response... The response only contains the first CNAME, which has a 3600
TTL.

Then dnsmasq doesn't send another query for an hour - despite the fact that
it doesn't have a "good" answer.
In this case the query it sends after an hour gets incomplete response
again - not good.
Then I lost track because the container got moved to a different host - but
it looks like it was returning incomplete for several hours...


dnsmasq is otherwise well behaved - it is still responding to other queries
just fine, despite being hammered by more than 2k queries/second

Two questions:
 - Is it correct/wanted behaviour to cache an incomplete record like this?
I have no issue caching the cname, but should we keep trying to resolve the
cname to an a record?

 - Why/How does a restart of the querying program change the caching
behaviour of dnsmasq?
Because even if the program is restarted after just a few minutes it
immediately gets better data - my capture from yesterday shows that despite
the fact that the TTL had 2855 seconds (of the 3600 default) left just two
minutes before the first 'new process' request comes in, that new request
triggers an outbound query.


Cheers,

John

On Wed, 20 Mar 2019 at 23:44, John Robson  wrote:

> It is the idea of caching, but not beyond the record TTL surely? And why
> stop only when I reset another piece of software (whether I do that after 5
> minutes or 4 hours).
>
> I'm finding that the upstream server is inconsistent in how much
> information it returns - just occasionally not returning anything beyond
> the first CNAME - which means that this is probably passed on to my program
> as such, which means that something else is involved in triggering it...
>
> I don't expect this to be easy :(
>
> I think we may have found the application bug (it just doesn't know how to
> handle a non IP address return), but I'd still like to understand the
> behaviour from dnsmasq.
>
>
>
> On Wed, 20 Mar 2019 at 23:30, Geert Stappers  wrote:
>
>> On Wed, Mar 20, 2019 at 09:00:20PM +, John Robson wrote:
>> > Hi,
>> >
>> > I have a library which I think has a bug, but this bug is affecting DNS
>> > queries, and bringing out some odd behaviour in dnsmasq...
>> >
>> > Program is making a query to resolve an address (foo.bar.com)
>> > A normal query results in a CNAME (foo.bar.com.edgekey.net), which
>> results
>> > in another CNAME (e1234.a.akamaiedge.net) which has an A record.
>> >
>> > However every so often dnsmasq returns just the first CNAME.
>> > Note I haven't yet caught it in the act of that first truncated
>> response.
>> > The only thing that makes sense to me is if the edgekey.net name
>> servers
>> > didn't respond in good time... but
>> >
>> > However the bug in the library then means it asks again, instantly.  and
>> > again... and again
>> > It manages over 100MB/ minute of DNS requests - dnsmasq answering them
>> all
>> > from the cache (I see *no* external requests for that address).
>>
>> Hey, that is the idea about DNS caching ...
>>
>>
>> > When I restart the program the very first query (identical query as
>> before)
>> > gets a complete answer from dnsmasq.
>> >
>> > What I can't understand is how that restart makes any difference to
>> dnsmasq.
>> > Does dnsmasq have some sort of 'Oh hell the query load is insane I'm
>> just
>> > extending the cache a bit to help' mode which it then escapes from as
>> the
>> > program restarts?
>> > There are no external queries for this n

Re: [Dnsmasq-discuss] Odd caching behaviour...

2019-03-20 Thread John Robson
It is the idea of caching, but not beyond the record TTL surely? And why
stop only when I reset another piece of software (whether I do that after 5
minutes or 4 hours).

I'm finding that the upstream server is inconsistent in how much
information it returns - just occasionally not returning anything beyond
the first CNAME - which means that this is probably passed on to my program
as such, which means that something else is involved in triggering it...

I don't expect this to be easy :(

I think we may have found the application bug (it just doesn't know how to
handle a non IP address return), but I'd still like to understand the
behaviour from dnsmasq.



On Wed, 20 Mar 2019 at 23:30, Geert Stappers  wrote:

> On Wed, Mar 20, 2019 at 09:00:20PM +0000, John Robson wrote:
> > Hi,
> >
> > I have a library which I think has a bug, but this bug is affecting DNS
> > queries, and bringing out some odd behaviour in dnsmasq...
> >
> > Program is making a query to resolve an address (foo.bar.com)
> > A normal query results in a CNAME (foo.bar.com.edgekey.net), which
> results
> > in another CNAME (e1234.a.akamaiedge.net) which has an A record.
> >
> > However every so often dnsmasq returns just the first CNAME.
> > Note I haven't yet caught it in the act of that first truncated response.
> > The only thing that makes sense to me is if the edgekey.net name servers
> > didn't respond in good time... but
> >
> > However the bug in the library then means it asks again, instantly.  and
> > again... and again
> > It manages over 100MB/ minute of DNS requests - dnsmasq answering them
> all
> > from the cache (I see *no* external requests for that address).
>
> Hey, that is the idea about DNS caching ...
>
>
> > When I restart the program the very first query (identical query as
> before)
> > gets a complete answer from dnsmasq.
> >
> > What I can't understand is how that restart makes any difference to
> dnsmasq.
> > Does dnsmasq have some sort of 'Oh hell the query load is insane I'm just
> > extending the cache a bit to help' mode which it then escapes from as the
> > program restarts?
> > There are no external queries for this name during the period of
> insanity,
> > but the first request after does get put to the external name servers.
> >
> > I'm running an 'external interface only' capture to try and capture the
> > initial error condition (which I very much doubt is a problem in
> dnsmasq),
> > to see if that can shed some light on the issue.
> >
> >
> > Thoughts? debug hints? laughter?
>
>
> To me it seems that the first DNS request from the application has
> "recursion".  Upon encountering the bug is doing the app "non
> recursion". With "recusion" do I mean 'When the reply is not an A-record
> do a next query'.
>
> On debug hints:  Currently is the suspected trigger of the bug
> a DNS that doesn't respond within good time.  So make a "chain"
> of DNServers where you control the response time of one.
>
> Good luck with it.  And feel welcome to report back.
>
>
> > Cheers,
> > John
>
> Groeten
> Geert Stappers
> --
> Leven en laten leven
>
> ___
> Dnsmasq-discuss mailing list
> Dnsmasq-discuss@lists.thekelleys.org.uk
> http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss
>


-- 

*John Robson*
___
Dnsmasq-discuss mailing list
Dnsmasq-discuss@lists.thekelleys.org.uk
http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss


[Dnsmasq-discuss] Odd caching behaviour...

2019-03-20 Thread John Robson
Hi,

I have a library which I think has a bug, but this bug is affecting DNS
queries, and bringing out some odd behaviour in dnsmasq...

Program is making a query to resolve an address (foo.bar.com)
A normal query results in a CNAME (foo.bar.com.edgekey.net), which results
in another CNAME (e1234.a.akamaiedge.net) which has an A record.

However every so often dnsmasq returns just the first CNAME.
Note I haven't yet caught it in the act of that first truncated response.
The only thing that makes sense to me is if the edgekey.net name servers
didn't respond in good time... but

However the bug in the library then means it asks again, instantly.  and
again... and again
It manages over 100MB/ minute of DNS requests - dnsmasq answering them all
from the cache (I see *no* external requests for that address).

When I restart the program the very first query (identical query as before)
gets a complete answer from dnsmasq.

What I can't understand is how that restart makes any difference to dnsmasq.
Does dnsmasq have some sort of 'Oh hell the query load is insane I'm just
extending the cache a bit to help' mode which it then escapes from as the
program restarts?
There are no external queries for this name during the period of insanity,
but the first request after does get put to the external name servers.

I'm running an 'external interface only' capture to try and capture the
initial error condition (which I very much doubt is a problem in dnsmasq),
to see if that can shed some light on the issue.


Thoughts? debug hints? laughter?

Cheers,

John

-- 
*John Robson*
___
Dnsmasq-discuss mailing list
Dnsmasq-discuss@lists.thekelleys.org.uk
http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss


Re: [Dnsmasq-discuss] Query forwarding behaviour with multiple name servers.

2019-03-07 Thread John Robson
In the case I was seeing ... one of the three servers was returning
nxdomain for internal queries (user had defined google as a ‘backup’
resolver). So the subsequent replies had massive value (they contained
information, rather than no information).

I’ve removed the ‘backup resolver’ from their config, cloud systems get
very fast response times from google!





On Thu, 7 Mar 2019 at 18:24, Simon Kelley  wrote:

> On 08/02/2019 09:49, John Robson wrote:
> > Hi all,
> >
> > I'm trying to understand the mechanism by which dnsmasq uses the
> > resolvers specified (in this case they are all specified in
> > /etc/resolv.conf).
> > Specifically I am trying to work out why dnsmasq is (erratically)
> > sending the same query to multiple servers, and not listening beyond the
> > first response.
> >
> >
> > As I understand it the default (i.e. non dnsmasq) resolver behaviour is
> > to try the first name server entry first, then the second etc.  This can
> > be changed by use of the 'rotate' option in that file.
> >
> > However, dnsmasq reads it's name servers from /etc/resolv.conf, but the
> > defaults are different - relevant options from the man page say:
> > *-o, --strict-order*
> > By default, dnsmasq will send queries to any of the upstream servers
> > it knows about and tries to favour servers that are known to be up.
> > Setting this flag forces dnsmasq to try each query with each server
> > strictly in the order they appear in /etc/resolv.conf
> > *--all-servers*
> > By default, when dnsmasq has more than one upstream server
> > available, it will send queries to just one server. Setting this
> > flag forces dnsmasq to send all queries to all available servers.
> > The reply from the server which answers first will be returned to
> > the original requester.
> >
> > To me that means that, by default, dnsmasq will send to any one of the
> > upstream servers, favouring servers it thinks are up - that seems
> > reasonable.
> >
> >
> > What I am seeing is that sometimes (and I can't figure a packet count, a
> > query count, or a time based correlation) dnsmasq forwards a query to
> > both of the listed name servers (I presume this is part of the
> > 'aliveness' testing?).
> > When this happens dnsmasq then only listens to the first reply, meaning
> > that the server which is slightly slower/further away then gets their
> > response bounced in an ICMP port unreachable message from the dnsmasq
> box.
> >
> > I then see dnsmasq stick to the 'first responding' server until it
> > forwards a query to both again (always in the same order, that listed in
> > /etc/resolv.conf) and, depending on the first response, it either sticks
> > or flips it's preferred server until ???
> >
> >
> > Two questions:
> >  - What triggers dnsmasq to forward a query to multiple upstream
> > resolvers (aside from the first query after startup, which seems
> reasonable)
>
> Kevin answered this.
>
> >  - Why does dnsmasq not bother to listen for the second (or more)
> > response - which would surely be useful in terms of timing/aliveness
> > information, as well as less odd for the upstream server*.
>
> Because to do so involves keeping resources around: at least some state
> and an open network socket. Since a server may never respond, those
> resources have to be reclaimed at some point (this functions exists
> already, since no answer may be forthcoming from any server) If dnsmasq
> is sending queries to a server which never answers, that implies far
> more resources hanging around during a long timeout, which increases the
> resource footprint for the daemon, and maybe even provides an DoS attack
> opportunity. TBH, it never occurred to me that the subsequent replies
> had any real utility, but I can see that they might. I'm not aware of
> any DNS server which would react in any way to an ICMP port unreachable.
> Don't forget that this is UDP. The server sends the reply "fire an
> forget". I think it would be next to impossible to get the OS to even
> tell the server that the port unreachable message had been seen.
>
>
> Cheers,
>
> Simon.
>
>
> >
> > Cheers,
> >
> > John
> >
> >
> > * I can imagine an upstream server eventually deciding that it is being
> > used in an amplification attack and just not responding any more.
> >
> >
> > --
> >
> >
> > ___
> > Dnsmasq-discuss mailing list
> > Dnsmasq-discuss@lists.thekelleys.org.uk
> > http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss
> >
>
>
> ___
> Dnsmasq-discuss mailing list
> Dnsmasq-discuss@lists.thekelleys.org.uk
> http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss
>
___
Dnsmasq-discuss mailing list
Dnsmasq-discuss@lists.thekelleys.org.uk
http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss


[Dnsmasq-discuss] Query forwarding behaviour with multiple name servers.

2019-02-08 Thread John Robson
Hi all,

I'm trying to understand the mechanism by which dnsmasq uses the resolvers
specified (in this case they are all specified in /etc/resolv.conf).
Specifically I am trying to work out why dnsmasq is (erratically) sending
the same query to multiple servers, and not listening beyond the first
response.


As I understand it the default (i.e. non dnsmasq) resolver behaviour is to
try the first name server entry first, then the second etc.  This can be
changed by use of the 'rotate' option in that file.

However, dnsmasq reads it's name servers from /etc/resolv.conf, but the
defaults are different - relevant options from the man page say:
*-o, --strict-order*By default, dnsmasq will send queries to any of the
upstream servers it knows about and tries to favour servers that are known
to be up. Setting this flag forces dnsmasq to try each query with each
server strictly in the order they appear in /etc/resolv.conf*--all-servers*By
default, when dnsmasq has more than one upstream server available, it will
send queries to just one server. Setting this flag forces dnsmasq to send
all queries to all available servers. The reply from the server which
answers first will be returned to the original requester.

To me that means that, by default, dnsmasq will send to any one of the
upstream servers, favouring servers it thinks are up - that seems
reasonable.


What I am seeing is that sometimes (and I can't figure a packet count, a
query count, or a time based correlation) dnsmasq forwards a query to both
of the listed name servers (I presume this is part of the 'aliveness'
testing?).
When this happens dnsmasq then only listens to the first reply, meaning
that the server which is slightly slower/further away then gets their
response bounced in an ICMP port unreachable message from the dnsmasq box.

I then see dnsmasq stick to the 'first responding' server until it forwards
a query to both again (always in the same order, that listed in
/etc/resolv.conf) and, depending on the first response, it either sticks or
flips it's preferred server until ???


Two questions:
 - What triggers dnsmasq to forward a query to multiple upstream resolvers
(aside from the first query after startup, which seems reasonable)
 - Why does dnsmasq not bother to listen for the second (or more) response
- which would surely be useful in terms of timing/aliveness information, as
well as less odd for the upstream server*.

Cheers,

John


* I can imagine an upstream server eventually deciding that it is being
used in an amplification attack and just not responding any more.


--
___
Dnsmasq-discuss mailing list
Dnsmasq-discuss@lists.thekelleys.org.uk
http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss