Re: [Dnsmasq-discuss] Fwd: dig +trace failing

2018-09-19 Thread Simon Kelley
On 19/09/18 13:04, Dominik DL6ER wrote:
> Hey Simon,
> 
> On 19.09.2018 13:27, Simon Kelley wrote:
>> when rd is not set, never answer
>> from the cache, but always forward the query. That would allow dig
>> +trace to work.
>>
>> Does hat seem sensible?
> 
> Yes, that seems useful.
> 
> Best,
> Dominik
> 
> 

Change made in git repo.

Cheers,

Simon.


___
Dnsmasq-discuss mailing list
Dnsmasq-discuss@lists.thekelleys.org.uk
http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss


Re: [Dnsmasq-discuss] clients of DHCPv6 with constructed IPv6 address range are not notified on address range change

2018-09-19 Thread Andrey Vakhitov
Hello Simon,

> If you look, lots of things are different between the two logs. In the
second one, 
> dhcpcd is doing routing table changes, for instance. That could explain
why dnsmasq 
> gives up trying to confirm SLAAC addresses because it gets transient "no
route to host" 
> returns. (see previous reply to make sense of this.)

Ok, change of the routing is actually the "normal case" for me in this
scenario. Once again: My ISP requires nightly reconnect. After the reconnect
IPv6 address range assigned by IPS changes normally. Delegated prefixes
allocated by upstream router are changing also. Addresses of internal
interfaces there dnsmasq provides DHCP & DNS services are changing as well
(new prefix). And this is exactly the reason why I want to utilize
"ra-names" option: IPv6 prefixes are changing every day and I need name
resolution to reach hosts via IPv6.

Best Regards,
--
Andrey Vakhitov

E-Mail:  and...@vakhitov.net    Stuttgart, Germany




___
Dnsmasq-discuss mailing list
Dnsmasq-discuss@lists.thekelleys.org.uk
http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss


[Dnsmasq-discuss] Fwd: dig +trace failing

2018-09-19 Thread Dominik DL6ER
Hey Simon,

On 19.09.2018 13:27, Simon Kelley wrote:
> when rd is not set, never answer
> from the cache, but always forward the query. That would allow dig
> +trace to work.
>
> Does hat seem sensible?

Yes, that seems useful.

Best,
Dominik


___
Dnsmasq-discuss mailing list
Dnsmasq-discuss@lists.thekelleys.org.uk
http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss


Re: [Dnsmasq-discuss] Seg. fault in cache.c after commt b6f926fb

2018-09-19 Thread Kristian Evensen
On Wed, Sep 19, 2018 at 1:44 PM Simon Kelley  wrote:
> This all makes me slightly uneasy. I think the "out of memory"
> explanation for the crashes you are seeing is not a good one.

No, I agree. I have compiled an OpenWRT image without the fix and
installed it on my device, and I am trying to reproduce the issue.
Will let you know when or if I figure out something.

BR,
Kristian

___
Dnsmasq-discuss mailing list
Dnsmasq-discuss@lists.thekelleys.org.uk
http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss


Re: [Dnsmasq-discuss] Seg. fault in cache.c after commt b6f926fb

2018-09-19 Thread Simon Kelley



On 19/09/18 08:59, Kristian Evensen wrote:
> Hi Simon,
> 
> Thanks for a quick reply.
> 
> On Wed, Sep 19, 2018 at 12:23 AM Simon Kelley  wrote:
>> Thanks for the report. The obvious explanation is that whine_malloc() is
>> returning NULL, and the code should handle that. whine_malloc only
>> returns NULL if the system cannot allocate any more memory, which is
>> possible, but unlikely. Is your router very short on memory?
> 
> No, the router has plenty of memory (2GB) and I don't see the "failed
> to allocate"-message, so I guess whine_malloc() can't be the culprit.
> Since I am using OpenWRT, there could be some defines affecting the
> line numbers. I tried to read up on how ifdefs affects line numbers in
> gdb backtraces to see if the error could be somewhere else than the
> "default" line 1437, but I unfortunately couldn't find anything.
> Probably my google-foo is a bit rusty.
> 
> When looking over my notes, I see that I have made the following
> observations related to this bug:
> 
> * Crash happens quite rarely.
> * I have only seen the bug right after boot.
> * When the bug strikes, dnsmasq will enter a crash loop and never
> recover. I.e., I can restart dnsmasq as many times as I like, crash
> always happens.
> * If I start dnsmasq manually and run it in the foreground after a
> crash, I also see the error.
> 
> So there seems to be something in the system causing this error, but I
> can't figure out what.
> 
>> I think the best solution is to wrap all of
>>
>>   *crecp = *source;
>>   crecp->flags &= ~(F_IPV4 | F_IPV6 | F_CNAME | F_DNSKEY | F_DS |
>> F_REVERSE);
>>   crecp->flags |= F_NAMEP;
>>   crecp->name.namep = name;
>>
>>   cache_hash(crecp);
>>
>> with
>>
>> if (crecp)
>> {
>> }
> 
> Thanks, this is basically the same as my current fix, so I can already
> report that it is good :)
> 

This all makes me slightly uneasy. I think the "out of memory"
explanation for the crashes you are seeing is not a good one.

The patch is definitely needed. The philosophy for memory allocation
failures is that the code should be not terminate: it logs an error and
continues, but not necessarily behaving correctly. That's about the best
you can do.

Unfortunately, with the patch, if crecp is NULL at this point for some
other reason, it will now fail silently, and behave wrongly (the
non-terminal cache entry will not be created)

It would be really good to find out what actually causes this, and
preferably a way to reproduce it.

Cheers,

Simon.




___
Dnsmasq-discuss mailing list
Dnsmasq-discuss@lists.thekelleys.org.uk
http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss


[Dnsmasq-discuss] Duplicate IP detection with fixed IP

2018-09-19 Thread Simon Kelley


On 19/09/18 11:09, Bernard CLABOTS wrote:
> Thanks a lot for this answer.
> 
> Indeed, it is a special case as we have a simple two way Request/ACK,
> this is also what is seen with some implementations when quickly
> unplugging/re-plugging the cable, it is legal AFAIK.
> 
> I also agree on the necessity to be efficient in case of loss of the
> lease dB.
> 
> Yet reading the RFC-2131, I saw:
>   If the client's request is invalid (e.g., the client has moved
>   to a new subnet), servers SHOULD respond with a DHCPNAK message to
>   the client. Servers SHOULD NOT respond if their information is not
>   guaranteed to be accurate.  For example, a server that identifies a
>   request for an expired binding that is owned by another server SHOULD
>   NOT respond with a DHCPNAK unless the servers are using an explicit
>   mechanism to maintain coherency among the servers.
> 
> **//___^Referring to the first sentence, I agree it is only a should.
> Though, the next sentence is, according to your explanation, also
> relevant in this case, so DNSMasq should not respond if the information
> is not guaranteed to be accurate. Which also means that changing the
> authoritative flag, we risk to end up in the exemplified case where
> DNSMasq cannot guarantee that the requested IP is belonging to another
> DHCP Server, so it should not NAK and we are going in circles...
> We can of course discuss whether the Request is invalid simply because
> that IP is currently used by another device while not even assigned
> through DHCP. I would argue that the DNSMasq code explicitly accept that
> requesting the IP of the server fulfills this condition, which IMHO is a
> similar case.
> **//___^
> Anyhow, moving forward to resolve the issue I face, is there any way to
> force the RFC behavior of NAK-ing and forcing the 4 way exchange?
> 

If you don't set dhcp-authoritative, then the client will eventually
move to the four-way exchange, but it may take some time, as it involves
time-outs. The reason for this is that the dnsmasq server has to assume
there are other DHCP servers on the network which may hold a lease for
the client.

The differences in behaviour are these.

Without dhcp-authoritative:

1) A client sending DHCPREQUEST in init-reboot state which doesn't have
a lease in the database will be ignored.

2) A client sending a DHCPREQUEST in rebind mode which doesn't have a
lease in the database will be ignored. In renew mode (ie unicast
request) it will get a DHCPNAK.

3) A client sending a request with the wrong server-id will be ignored.

With dhcp-authoritative

1) A client sending DHCPREQUEST in init-reboot state which doesn't have
a lease will have the lease created

2) A client sending a DHCPREQUEST in renew or rebind mode which doesn't
have a lease in the database will have a lease created.

3)  A client sending a request in INIT_REBOOT or SELECTING state with
the wrong server-id will get a DHCPNAK.


Cheers,

Simon.



___
Dnsmasq-discuss mailing list
Dnsmasq-discuss@lists.thekelleys.org.uk
http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss


Re: [Dnsmasq-discuss] dig +trace failing

2018-09-19 Thread Simon Kelley
The change in question causes dnsmasq to always return SERVFAIL for
queries without the "use recursion" bit set.


The relevant quote in the reference

http://cs.unc.edu/~fabian/course_papers/cache_snooping.pdf

is this:

Recommendation 2: secondly, and most importantly, non-authoritative
requests to DNS caches should not be allowed. For instance dnscache, a
popular caching-only DNS implementation, tries to prevent cache
snooping by refusing to answer non-recursive queries [3]. Another option
is to never consult the cache when responding to non-RD queries.

So dnsmasq could adopt the alternative: when rd is not set, never answer
from the cache, but always forward the query. That would allow dig
+trace to work.

Does hat seem sensible?


Cheers,

Simon.



On 19/09/18 11:16, Dominik DL6ER wrote:
> Dear list members,
> 
> I expect "dig +trace" to show a trace of the delegation path from the
> root name servers for the name being looked up. This behavior is broken
> since commit 087eb76140725f8f1892ba6f251ea052d3440966
> 
> and is not fixed until now (I compiled and tested the most recent
> "master" branch of dnsmasq).
> 
> 
> 
> With dnsmasq v2.80test6, and v2.79, I see:
> 
> $ dig +trace www.example.com
> ; <<>> DiG 9.10.3-P4-Ubuntu <<>> +trace www.example.com
> ;; global options: +cmd
> ;; Received 17 bytes from 192.168.2.11#53(pi.hole) in 76 ms
> 
> With dnsmasq v2.78 (and previously), I see:
> 
> $ dig +trace www.example.com
> ; <<>> DiG 9.10.3-P4-Ubuntu <<>> +trace www.example.com
> ;; global options: +cmd
> .            42569    IN    NS    l.root-servers.net.
> .            42569    IN    NS    k.root-servers.net.
> .            42569    IN    NS    e.root-servers.net.
> .            42569    IN    NS    h.root-servers.net.
> .            42569    IN    NS    j.root-servers.net.
> .            42569    IN    NS    i.root-servers.net.
> .            42569    IN    NS    g.root-servers.net.
> .            42569    IN    NS    a.root-servers.net.
> .            42569    IN    NS    b.root-servers.net.
> .            42569    IN    NS    m.root-servers.net.
> .            42569    IN    NS    c.root-servers.net.
> .            42569    IN    NS    f.root-servers.net.
> .            42569    IN    NS    d.root-servers.net.
> ;; Received 241 bytes from 192.168.2.11#53(pi.hole) in 115 ms
> 
> 
> Best regards,
> Dominik
> 
> 
> 
> ___
> Dnsmasq-discuss mailing list
> Dnsmasq-discuss@lists.thekelleys.org.uk
> http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss
> 

___
Dnsmasq-discuss mailing list
Dnsmasq-discuss@lists.thekelleys.org.uk
http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss


Re: [Dnsmasq-discuss] Duplicate IP detection with fixed IP

2018-09-19 Thread Bernard CLABOTS
Thanks a lot for this answer.
Indeed, it is a special case as we have a simple two way Request/ACK, this is 
also what is seen with some implementations when quickly unplugging/re-plugging 
the cable, it is legal AFAIK.
I also agree on the necessity to be efficient in case of loss of the lease dB.
Yet reading the RFC-2131, I saw:  If the client's request is invalid (e.g., 
the client has moved
  to a new subnet), servers SHOULD respond with a DHCPNAK message to
  the client. Servers SHOULD NOT respond if their information is not
  guaranteed to be accurate.  For example, a server that identifies a
  request for an expired binding that is owned by another server SHOULD
  NOT respond with a DHCPNAK unless the servers are using an explicit
  mechanism to maintain coherency among the servers.

Referring to the first sentence, I agree it is only a should. Though, the next 
sentence is, according to your explanation, also relevant in this case, so 
DNSMasq should not respond if the information is not guaranteed to be accurate. 
Which also means that changing the authoritative flag, we risk to end up in the 
exemplified case where DNSMasq cannot guarantee that the requested IP is 
belonging to another DHCP Server, so it should not NAK and we are going in 
circles...We can of course discuss whether the Request is invalid simply 
because that IP is currently used by another device while not even assigned 
through DHCP. I would argue that the DNSMasq code explicitly accept that 
requesting the IP of the server fulfills this condition, which IMHO is a 
similar case.
Anyhow, moving forward to resolve the issue I face, is there any way to force 
the RFC behavior of NAK-ing and forcing the 4 way exchange?
Thanks a lot!Regards,Bernard 

On Wednesday, 19 September 2018, 1:16, Simon Kelley 
 wrote:
 

 On 18/09/18 16:59, Bernard CLABOTS wrote:
> Hi all,
>    I have been trying to replicate an issue of IP conflict on Open-WRT,
> the issue is randomly seen, and I expect in real life, it is related to
> a de-sync of the lease data base with the actual situation (in case a
> switch is between the client and the server and the server is rebooted
> e.g., so that the client acts as though it would have a fixed IP.
> Reported as seen as well when moving a client from one setup to another
> setup where the IP that it used to receive is used on the LAN).
> 
>    I tested with 2 different versions of dnsmasq (2.78 and 2.79).
> 
>    I use Scapy to forge DHCP Requests (see further).
> 
> Setup:
> I have a laptop with a fixed IP inside the range of the DHCP
> (192.168.1.0/26). I then forge a Request of that IP using scapy and I
> cannot explain the behavior:
> 1. I see no ARP whatsoever to the requested IP when DNSMasq handles the
> request.
> 2. When I request the fixed IP for a client with a random MAC, I
> instantly receive an ACK, then I see some unanswered ARP requests
> (*after*) as to "who has [IP just assigned]? Tell 192.168.1.1" where
> 192.168.1.1 is the DHCP server IP.
> 
> I end up in a situation where the dhcp.leases contains the fake MAC
> associated to the lease, while the ARP table contains the MAC of the
> fixed IP laptop (probably because I'm not sending any IP packet where
> the IP is associated to the fake MAC, so the switch cannot learn it).
> 
> I have observed that Windows 10 has a mechanism to prevent conflicts
> where, whenever a fixed IP is used/configured, after the link is up an
> ARP probe is sent with its own IP. In case it gets answered, the client
> keeps silent and start using a link local IPv4 (169). Yet I have
> tested with a very old laptop running Windows 3.1 and I can replicate
> the issue.
> But basically, it is puzzling that the device is ARPing *after* the DHCP
> distributed the IP.
> 
> *The all issue seems to boil down to:* why does DNSMasq not check if the
> IP is free before assigning it?
> I thought that unless option "-5" or "--no-ping" was set, DNSMasq would
> always ping once to the assigned IP *before* assignment (I controlled in
> the code and see that actually, there is a mechanism to store the
> positive identification as well as to blacklist IP's in case a client is
> constantly coming back).
> The only ARP I see in this case is *after* the IP is assigned. How come
> DNSMasq is not trying to ping before assignment? Is there an option to
> force this behavior (from the code I guess not)? Is DNSMasq also somehow
> relying on the ARP table and flags that are set on reachability? or
> solely on the _non_ answer to ping?
> 
> Thanks a lot for your assistance.
> 
> Regards,
> Bernard
> 
> Scapy forged packet (I know the source MAC does not match the client
> MAC, but I deem this good enough for testing, AFAIK it is a legal packet):
> dhcp_request = Ether(dst='ff:ff:ff:ff:ff:ff')/IP(src='0.0.0.0',
> dst='255.255.255.255')/UDP(dport=67,
> sport=68)/BOOTP(xid=RandInt())/DHCP(options=[('message-type',
> 

[Dnsmasq-discuss] dig +trace failing

2018-09-19 Thread Dominik DL6ER
Dear list members,

I expect "dig +trace" to show a trace of the delegation path from the
root name servers for the name being looked up. This behavior is broken
since commit 087eb76140725f8f1892ba6f251ea052d3440966

and is not fixed until now (I compiled and tested the most recent
"master" branch of dnsmasq).



With dnsmasq v2.80test6, and v2.79, I see:

$ dig +trace www.example.com
; <<>> DiG 9.10.3-P4-Ubuntu <<>> +trace www.example.com
;; global options: +cmd
;; Received 17 bytes from 192.168.2.11#53(pi.hole) in 76 ms

With dnsmasq v2.78 (and previously), I see:

$ dig +trace www.example.com
; <<>> DiG 9.10.3-P4-Ubuntu <<>> +trace www.example.com
;; global options: +cmd
.            42569    IN    NS    l.root-servers.net.
.            42569    IN    NS    k.root-servers.net.
.            42569    IN    NS    e.root-servers.net.
.            42569    IN    NS    h.root-servers.net.
.            42569    IN    NS    j.root-servers.net.
.            42569    IN    NS    i.root-servers.net.
.            42569    IN    NS    g.root-servers.net.
.            42569    IN    NS    a.root-servers.net.
.            42569    IN    NS    b.root-servers.net.
.            42569    IN    NS    m.root-servers.net.
.            42569    IN    NS    c.root-servers.net.
.            42569    IN    NS    f.root-servers.net.
.            42569    IN    NS    d.root-servers.net.
;; Received 241 bytes from 192.168.2.11#53(pi.hole) in 115 ms


Best regards,
Dominik

___
Dnsmasq-discuss mailing list
Dnsmasq-discuss@lists.thekelleys.org.uk
http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss


Re: [Dnsmasq-discuss] Seg. fault in cache.c after commt b6f926fb

2018-09-19 Thread Kevin Darbyshire-Bryant


> On 19 Sep 2018, at 08:59, Kristian Evensen  wrote:
> 
> Hi Simon,
> 
> Thanks for a quick reply.
> 
> On Wed, Sep 19, 2018 at 12:23 AM Simon Kelley  wrote:
>> Thanks for the report. The obvious explanation is that whine_malloc() is
>> returning NULL, and the code should handle that. whine_malloc only
>> returns NULL if the system cannot allocate any more memory, which is
>> possible, but unlikely. Is your router very short on memory?
> 
> No, the router has plenty of memory (2GB) and I don't see the "failed
> to allocate"-message, so I guess whine_malloc() can't be the culprit.
> Since I am using OpenWRT, there could be some defines affecting the
> line numbers. I tried to read up on how ifdefs affects line numbers in
> gdb backtraces to see if the error could be somewhere else than the
> "default" line 1437, but I unfortunately couldn't find anything.
> Probably my google-foo is a bit rusty.
> 
> When looking over my notes, I see that I have made the following
> observations related to this bug:
> 
> * Crash happens quite rarely.
> * I have only seen the bug right after boot.
> * When the bug strikes, dnsmasq will enter a crash loop and never
> recover. I.e., I can restart dnsmasq as many times as I like, crash
> always happens.
> * If I start dnsmasq manually and run it in the foreground after a
> crash, I also see the error.
> 
> So there seems to be something in the system causing this error, but I
> can't figure out what.
> 
>> I think the best solution is to wrap all of
>> 
>>  *crecp = *source;
>>  crecp->flags &= ~(F_IPV4 | F_IPV6 | F_CNAME | F_DNSKEY | F_DS |
>> F_REVERSE);
>>  crecp->flags |= F_NAMEP;
>>  crecp->name.namep = name;
>> 
>>  cache_hash(crecp);
>> 
>> with
>> 
>> if (crecp)
>> {
>> }
> 
> Thanks, this is basically the same as my current fix, so I can already
> report that it is good :)
> 
> BR,
> Kristian
> 

And I backported the fix into openwrt master this morning.


Cheers,

Kevin D-B

012C ACB2 28C6 C53E 9775  9123 B3A2 389B 9DE2 334A



signature.asc
Description: Message signed with OpenPGP
___
Dnsmasq-discuss mailing list
Dnsmasq-discuss@lists.thekelleys.org.uk
http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss


Re: [Dnsmasq-discuss] Seg. fault in cache.c after commt b6f926fb

2018-09-19 Thread Kristian Evensen
Hi Simon,

Thanks for a quick reply.

On Wed, Sep 19, 2018 at 12:23 AM Simon Kelley  wrote:
> Thanks for the report. The obvious explanation is that whine_malloc() is
> returning NULL, and the code should handle that. whine_malloc only
> returns NULL if the system cannot allocate any more memory, which is
> possible, but unlikely. Is your router very short on memory?

No, the router has plenty of memory (2GB) and I don't see the "failed
to allocate"-message, so I guess whine_malloc() can't be the culprit.
Since I am using OpenWRT, there could be some defines affecting the
line numbers. I tried to read up on how ifdefs affects line numbers in
gdb backtraces to see if the error could be somewhere else than the
"default" line 1437, but I unfortunately couldn't find anything.
Probably my google-foo is a bit rusty.

When looking over my notes, I see that I have made the following
observations related to this bug:

* Crash happens quite rarely.
* I have only seen the bug right after boot.
* When the bug strikes, dnsmasq will enter a crash loop and never
recover. I.e., I can restart dnsmasq as many times as I like, crash
always happens.
* If I start dnsmasq manually and run it in the foreground after a
crash, I also see the error.

So there seems to be something in the system causing this error, but I
can't figure out what.

> I think the best solution is to wrap all of
>
>   *crecp = *source;
>   crecp->flags &= ~(F_IPV4 | F_IPV6 | F_CNAME | F_DNSKEY | F_DS |
> F_REVERSE);
>   crecp->flags |= F_NAMEP;
>   crecp->name.namep = name;
>
>   cache_hash(crecp);
>
> with
>
> if (crecp)
> {
> }

Thanks, this is basically the same as my current fix, so I can already
report that it is good :)

BR,
Kristian

___
Dnsmasq-discuss mailing list
Dnsmasq-discuss@lists.thekelleys.org.uk
http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss