Re: [Dnsmasq-discuss] Seg. fault in cache.c after commt b6f926fb

2018-09-19 Thread Kristian Evensen
On Wed, Sep 19, 2018 at 1:44 PM Simon Kelley  wrote:
> This all makes me slightly uneasy. I think the "out of memory"
> explanation for the crashes you are seeing is not a good one.

No, I agree. I have compiled an OpenWRT image without the fix and
installed it on my device, and I am trying to reproduce the issue.
Will let you know when or if I figure out something.

BR,
Kristian

___
Dnsmasq-discuss mailing list
Dnsmasq-discuss@lists.thekelleys.org.uk
http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss


Re: [Dnsmasq-discuss] Seg. fault in cache.c after commt b6f926fb

2018-09-19 Thread Simon Kelley



On 19/09/18 08:59, Kristian Evensen wrote:
> Hi Simon,
> 
> Thanks for a quick reply.
> 
> On Wed, Sep 19, 2018 at 12:23 AM Simon Kelley  wrote:
>> Thanks for the report. The obvious explanation is that whine_malloc() is
>> returning NULL, and the code should handle that. whine_malloc only
>> returns NULL if the system cannot allocate any more memory, which is
>> possible, but unlikely. Is your router very short on memory?
> 
> No, the router has plenty of memory (2GB) and I don't see the "failed
> to allocate"-message, so I guess whine_malloc() can't be the culprit.
> Since I am using OpenWRT, there could be some defines affecting the
> line numbers. I tried to read up on how ifdefs affects line numbers in
> gdb backtraces to see if the error could be somewhere else than the
> "default" line 1437, but I unfortunately couldn't find anything.
> Probably my google-foo is a bit rusty.
> 
> When looking over my notes, I see that I have made the following
> observations related to this bug:
> 
> * Crash happens quite rarely.
> * I have only seen the bug right after boot.
> * When the bug strikes, dnsmasq will enter a crash loop and never
> recover. I.e., I can restart dnsmasq as many times as I like, crash
> always happens.
> * If I start dnsmasq manually and run it in the foreground after a
> crash, I also see the error.
> 
> So there seems to be something in the system causing this error, but I
> can't figure out what.
> 
>> I think the best solution is to wrap all of
>>
>>   *crecp = *source;
>>   crecp->flags &= ~(F_IPV4 | F_IPV6 | F_CNAME | F_DNSKEY | F_DS |
>> F_REVERSE);
>>   crecp->flags |= F_NAMEP;
>>   crecp->name.namep = name;
>>
>>   cache_hash(crecp);
>>
>> with
>>
>> if (crecp)
>> {
>> }
> 
> Thanks, this is basically the same as my current fix, so I can already
> report that it is good :)
> 

This all makes me slightly uneasy. I think the "out of memory"
explanation for the crashes you are seeing is not a good one.

The patch is definitely needed. The philosophy for memory allocation
failures is that the code should be not terminate: it logs an error and
continues, but not necessarily behaving correctly. That's about the best
you can do.

Unfortunately, with the patch, if crecp is NULL at this point for some
other reason, it will now fail silently, and behave wrongly (the
non-terminal cache entry will not be created)

It would be really good to find out what actually causes this, and
preferably a way to reproduce it.

Cheers,

Simon.




___
Dnsmasq-discuss mailing list
Dnsmasq-discuss@lists.thekelleys.org.uk
http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss


Re: [Dnsmasq-discuss] Seg. fault in cache.c after commt b6f926fb

2018-09-19 Thread Kevin Darbyshire-Bryant


> On 19 Sep 2018, at 08:59, Kristian Evensen  wrote:
> 
> Hi Simon,
> 
> Thanks for a quick reply.
> 
> On Wed, Sep 19, 2018 at 12:23 AM Simon Kelley  wrote:
>> Thanks for the report. The obvious explanation is that whine_malloc() is
>> returning NULL, and the code should handle that. whine_malloc only
>> returns NULL if the system cannot allocate any more memory, which is
>> possible, but unlikely. Is your router very short on memory?
> 
> No, the router has plenty of memory (2GB) and I don't see the "failed
> to allocate"-message, so I guess whine_malloc() can't be the culprit.
> Since I am using OpenWRT, there could be some defines affecting the
> line numbers. I tried to read up on how ifdefs affects line numbers in
> gdb backtraces to see if the error could be somewhere else than the
> "default" line 1437, but I unfortunately couldn't find anything.
> Probably my google-foo is a bit rusty.
> 
> When looking over my notes, I see that I have made the following
> observations related to this bug:
> 
> * Crash happens quite rarely.
> * I have only seen the bug right after boot.
> * When the bug strikes, dnsmasq will enter a crash loop and never
> recover. I.e., I can restart dnsmasq as many times as I like, crash
> always happens.
> * If I start dnsmasq manually and run it in the foreground after a
> crash, I also see the error.
> 
> So there seems to be something in the system causing this error, but I
> can't figure out what.
> 
>> I think the best solution is to wrap all of
>> 
>>  *crecp = *source;
>>  crecp->flags &= ~(F_IPV4 | F_IPV6 | F_CNAME | F_DNSKEY | F_DS |
>> F_REVERSE);
>>  crecp->flags |= F_NAMEP;
>>  crecp->name.namep = name;
>> 
>>  cache_hash(crecp);
>> 
>> with
>> 
>> if (crecp)
>> {
>> }
> 
> Thanks, this is basically the same as my current fix, so I can already
> report that it is good :)
> 
> BR,
> Kristian
> 

And I backported the fix into openwrt master this morning.


Cheers,

Kevin D-B

012C ACB2 28C6 C53E 9775  9123 B3A2 389B 9DE2 334A



signature.asc
Description: Message signed with OpenPGP
___
Dnsmasq-discuss mailing list
Dnsmasq-discuss@lists.thekelleys.org.uk
http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss


Re: [Dnsmasq-discuss] Seg. fault in cache.c after commt b6f926fb

2018-09-19 Thread Kristian Evensen
Hi Simon,

Thanks for a quick reply.

On Wed, Sep 19, 2018 at 12:23 AM Simon Kelley  wrote:
> Thanks for the report. The obvious explanation is that whine_malloc() is
> returning NULL, and the code should handle that. whine_malloc only
> returns NULL if the system cannot allocate any more memory, which is
> possible, but unlikely. Is your router very short on memory?

No, the router has plenty of memory (2GB) and I don't see the "failed
to allocate"-message, so I guess whine_malloc() can't be the culprit.
Since I am using OpenWRT, there could be some defines affecting the
line numbers. I tried to read up on how ifdefs affects line numbers in
gdb backtraces to see if the error could be somewhere else than the
"default" line 1437, but I unfortunately couldn't find anything.
Probably my google-foo is a bit rusty.

When looking over my notes, I see that I have made the following
observations related to this bug:

* Crash happens quite rarely.
* I have only seen the bug right after boot.
* When the bug strikes, dnsmasq will enter a crash loop and never
recover. I.e., I can restart dnsmasq as many times as I like, crash
always happens.
* If I start dnsmasq manually and run it in the foreground after a
crash, I also see the error.

So there seems to be something in the system causing this error, but I
can't figure out what.

> I think the best solution is to wrap all of
>
>   *crecp = *source;
>   crecp->flags &= ~(F_IPV4 | F_IPV6 | F_CNAME | F_DNSKEY | F_DS |
> F_REVERSE);
>   crecp->flags |= F_NAMEP;
>   crecp->name.namep = name;
>
>   cache_hash(crecp);
>
> with
>
> if (crecp)
> {
> }

Thanks, this is basically the same as my current fix, so I can already
report that it is good :)

BR,
Kristian

___
Dnsmasq-discuss mailing list
Dnsmasq-discuss@lists.thekelleys.org.uk
http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss


Re: [Dnsmasq-discuss] Seg. fault in cache.c after commt b6f926fb

2018-09-18 Thread Simon Kelley
On 18/09/18 11:28, Kristian Evensen wrote:
> Hello,
> 
> I recently updated one of my x86-based OpenWRT-routers to the latest
> nightly, which bumped dnsmasq to 2.80test6. After the update, I see
> that dnsmasq sometimes enters a crash loop. The crash occurs right
> startup (SIGSEV), and the backtrace, looks as follows:
> 
> 0x00406a78 in make_non_terminals
> (source=source@entry=0x77ffefa0) at cache.c:1437
> 1437 cache.c: No such file or directory.
> (gdb) bt
> #0  0x00406a78 in make_non_terminals
> (source=source@entry=0x77ffefa0) at cache.c:1437
> #1  0x00407192 in add_hosts_entry (cache=0x77ffefa0,
> addr=0x7fffea28, addrlen=4, index=2, rhash=,
> hashsz=) at cache.c:911
> #2  0x004074e1 in read_hostsfile
> (filename=filename@entry=0x425a8a "/etc/hosts", index=index@entry=2,
> cache_size=cache_size@entry=150, rhash=0x635020,
> hashsz=hashsz@entry=641) at cache.c:1040
> #3  0x00407810 in cache_reload () at cache.c:1185
> #4  0x00418037 in clear_cache_and_reload
> (now=now@entry=1537265437) at dnsmasq.c:1547
> #5  0x00405ec3 in async_event (now=1537265437, pipe=15) at
> dnsmasq.c:1310
> #6  main (argc=, argv=) at dnsmasq.c:1077
> 
> My current work-around is to check if crecp is NULL and then return
> from make_non_terminals(). However, I don't know the code well enough
> to know if this change will break anything else (though everything
> seems to work fine).
> 
> There is nothing special with my hosts file, it looks as follows:
> 
> 127.0.0.1 localhost
> 
> ::1 localhost ip6-localhost ip6-loopback
> ff02::1 ip6-allnodes
> ff02::2 ip6-allrouters
> 192.168.5.1 myrouter.lan
> 
> If checking for NULL is acceptable, I will be more than happy to send
> my patch. If not, I wonder if anyone has any hints on how to proceed
> to solve this issue?
> 

Thanks for the report. The obvious explanation is that whine_malloc() is
returning NULL, and the code should handle that. whine_malloc only
returns NULL if the system cannot allocate any more memory, which is
possible, but unlikely. Is your router very short on memory?
whine_malloc() is just a wrapper around malloc that logs an error if the
malloc fails (hence the name) Are you seeing messages like

failed to allocate %d bytes

just before the crashes? That would make me more comfortable that we
understand the problem. I think the best solution is to wrap all of

  *crecp = *source;
  crecp->flags &= ~(F_IPV4 | F_IPV6 | F_CNAME | F_DNSKEY | F_DS |
F_REVERSE);
  crecp->flags |= F_NAMEP;
  crecp->name.namep = name;

  cache_hash(crecp);

with

if (crecp)
{
}


and I'll commit that to the git repo now.

Cheers,

Simon

___
Dnsmasq-discuss mailing list
Dnsmasq-discuss@lists.thekelleys.org.uk
http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss


[Dnsmasq-discuss] Seg. fault in cache.c after commt b6f926fb

2018-09-18 Thread Kristian Evensen
Hello,

I recently updated one of my x86-based OpenWRT-routers to the latest
nightly, which bumped dnsmasq to 2.80test6. After the update, I see
that dnsmasq sometimes enters a crash loop. The crash occurs right
startup (SIGSEV), and the backtrace, looks as follows:

0x00406a78 in make_non_terminals
(source=source@entry=0x77ffefa0) at cache.c:1437
1437 cache.c: No such file or directory.
(gdb) bt
#0  0x00406a78 in make_non_terminals
(source=source@entry=0x77ffefa0) at cache.c:1437
#1  0x00407192 in add_hosts_entry (cache=0x77ffefa0,
addr=0x7fffea28, addrlen=4, index=2, rhash=,
hashsz=) at cache.c:911
#2  0x004074e1 in read_hostsfile
(filename=filename@entry=0x425a8a "/etc/hosts", index=index@entry=2,
cache_size=cache_size@entry=150, rhash=0x635020,
hashsz=hashsz@entry=641) at cache.c:1040
#3  0x00407810 in cache_reload () at cache.c:1185
#4  0x00418037 in clear_cache_and_reload
(now=now@entry=1537265437) at dnsmasq.c:1547
#5  0x00405ec3 in async_event (now=1537265437, pipe=15) at
dnsmasq.c:1310
#6  main (argc=, argv=) at dnsmasq.c:1077

My current work-around is to check if crecp is NULL and then return
from make_non_terminals(). However, I don't know the code well enough
to know if this change will break anything else (though everything
seems to work fine).

There is nothing special with my hosts file, it looks as follows:

127.0.0.1 localhost

::1 localhost ip6-localhost ip6-loopback
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
192.168.5.1 myrouter.lan

If checking for NULL is acceptable, I will be more than happy to send
my patch. If not, I wonder if anyone has any hints on how to proceed
to solve this issue?

BR,
Kristian

___
Dnsmasq-discuss mailing list
Dnsmasq-discuss@lists.thekelleys.org.uk
http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss