Re: [Dnsmasq-discuss] [PATCH] Fix buffer overflow in TCP requests

2020-07-06 Thread Simon Kelley
On 30/06/2020 20:22, Frank wrote:
> Resending this because I realized I sent it to Simon rather than the list:
> 
> Hi Simon,
> 
> This bug is fairly easy to reproduce. It can take 10 mins or more to
> reproduce a crash so I suggest checking the length you get in
> cache_recv_insert. If the domains being used are > 4 characters, this
> will catch the bug the first time it occurs. That usually happens
> within a minute. Something like this:
> 
> if (m > MAXDNAME) abort();
> 

OK, that makes sense.  EINTR or EAGAIN are possible, but rare. Does the
fix in 2.82test1 work for you? If so I think I'll work towards a 2.82
release before doing anything else, just to nail this.


Simon.


> Then bombard dnsmasq with TCP queries:
> 
> docker run --rm -v /root/dns_queries:/dns_queries
> aiys0211/flamethrower_v0.10.2 -f
> /dns_queries/queryfile-example-100thousand -l 1000 -c 20 -q 10 -d 300
> -p 53 -P tcp 
> 
> I used the first 100 thousand entries from here:
> https://www.dns-oarc.net/files/dnsperf/data/queryfile-example-10million-201202.gz
> 
> Frank
> 
> On Sun, Jun 28, 2020 at 1:58 PM Simon Kelley  wrote:
>>
>> That's a nasty bug, and could explain what pi-hole users are seeing.
>>
>> If I understand things correctly, this bug will only manifest itself
>> when the write() or read() syscalls return EINTR ir EAGAIN, which is
>> possible, but not common, hence the bugs wasn't detected earlier.
>>
>> Frank, did you find a way to reproduce this on demand, or just capture
>> sporadic instances?
>>
>> I've applied Frank's patch, but immediately superseded it with a more
>> general clean-up which should have the same effect. I've tagged a
>> 2.82test1 release with this. If that fixes the problem for pi-hole, I
>> plan to go to 2.82 fairly quickly: I don't want this hanging around.
>>
>> It's slightly ironic that the original change in 2.81 is to fix a
>> theoretical hang which I've never actually seen and cannot reproduce,
>> but it introduced a really crash bug.
>>
>>
>> Cheers,
>>
>> Simon.
>>
>>
>>
>>
>> On 18/06/2020 22:42, Dominik wrote:
>>> Hey Frank, dear list-members,
>>>
>>> thanks for your proposed fix. The Pi-hole team adopted dnsmasq v2.81
>>> early on and we're seeing reports for mysterious crashes scattered all
>>> over the dnsmasq code since the very first days of releasing our latest
>>> version. Crashes reports show several locations, e.g., in_zone(),
>>> iface_check() and even main() in a few places. TCP forks were always
>>> happening sometime (not necessarily immediately) before a crash.
>>> Running dnsmasq in debug mode (prevents forking) stopped any crashes.
>>>
>>> So far, we haven't had the time to dig into this ourselves properly,
>>> however, your description perfectly matches with my preliminary
>>> conclusion for one week ago:
 the dangling pointer is in **all** bug reports known so far in the
>>> daemon pointer
>>> (https://github.com/pi-hole/FTL/issues/805#issuecomment-643157367)
>>>
>>> The addition of TCP forks being able to add cache replies in dnsmasq
>>> v2.81 is what is causing this code (which has been like this for along
>>> time) to become problematic. I recently looked at the same code but
>>> missed the chance that the byte may be still in the pipe. This is a
>>> very good catch!
>>>
>>> I'm writing this mail to raise awareness that this appears to be a high
>>> severity bug as it can easily lead to anything between indeterminable
>>> behavior to fatal failures (typically the case). Even when the bug can
>>> only be triggered under certain conditions, it recently caused over 200
>>> posts on the Pi-hole issue ticket system (Github). Crashes due to this
>>> have been reported independently by several (more than 20) individual
>>> users.
>>> Best
>>> Dominik
>>>
>>> On Wed, 2020-06-17 at 19:52 -0700, Frank wrote:
 Hello,

 This patch fixes a buffer overflow in TCP requests. Since the read is
 not actually being retried, the byte written by the child can be left
 in the pipe. When that happens, cache_recv_insert() reads the length
 of the name, which is now multiplied by 256 due to the extra 0 byte
 (8 bit shift) and results in daemon->namebuff being overflowed.
 Namebuff is immediately before the daemon struct in memory so it ends
 up corrupting the beginning of the daemon struct.

 diff --git a/src/dnsmasq.c b/src/dnsmasq.c
 index 6481de8..457dea3 100644
 --- a/src/dnsmasq.c
 +++ b/src/dnsmasq.c
 @@ -1887,7 +1887,7 @@ static void check_dns_listeners(time_t now)
single byte comes back up the pipe, which
is sent by the child after it has closed
 the
netlink socket. */
 -   retry_send(read(pipefd[0], , 1));
 +   while (retry_send(read(pipefd[0], , 1)));
  #endif
 break;
   }
 @@ -1928,7 +1928,7 @@ static void 

Re: [Dnsmasq-discuss] [PATCH] Fix buffer overflow in TCP requests

2020-06-30 Thread Frank
Resending this because I realized I sent it to Simon rather than the list:

Hi Simon,

This bug is fairly easy to reproduce. It can take 10 mins or more to
reproduce a crash so I suggest checking the length you get in
cache_recv_insert. If the domains being used are > 4 characters, this
will catch the bug the first time it occurs. That usually happens
within a minute. Something like this:

if (m > MAXDNAME) abort();

Then bombard dnsmasq with TCP queries:

docker run --rm -v /root/dns_queries:/dns_queries
aiys0211/flamethrower_v0.10.2 -f
/dns_queries/queryfile-example-100thousand -l 1000 -c 20 -q 10 -d 300
-p 53 -P tcp 

I used the first 100 thousand entries from here:
https://www.dns-oarc.net/files/dnsperf/data/queryfile-example-10million-201202.gz

Frank

On Sun, Jun 28, 2020 at 1:58 PM Simon Kelley  wrote:
>
> That's a nasty bug, and could explain what pi-hole users are seeing.
>
> If I understand things correctly, this bug will only manifest itself
> when the write() or read() syscalls return EINTR ir EAGAIN, which is
> possible, but not common, hence the bugs wasn't detected earlier.
>
> Frank, did you find a way to reproduce this on demand, or just capture
> sporadic instances?
>
> I've applied Frank's patch, but immediately superseded it with a more
> general clean-up which should have the same effect. I've tagged a
> 2.82test1 release with this. If that fixes the problem for pi-hole, I
> plan to go to 2.82 fairly quickly: I don't want this hanging around.
>
> It's slightly ironic that the original change in 2.81 is to fix a
> theoretical hang which I've never actually seen and cannot reproduce,
> but it introduced a really crash bug.
>
>
> Cheers,
>
> Simon.
>
>
>
>
> On 18/06/2020 22:42, Dominik wrote:
> > Hey Frank, dear list-members,
> >
> > thanks for your proposed fix. The Pi-hole team adopted dnsmasq v2.81
> > early on and we're seeing reports for mysterious crashes scattered all
> > over the dnsmasq code since the very first days of releasing our latest
> > version. Crashes reports show several locations, e.g., in_zone(),
> > iface_check() and even main() in a few places. TCP forks were always
> > happening sometime (not necessarily immediately) before a crash.
> > Running dnsmasq in debug mode (prevents forking) stopped any crashes.
> >
> > So far, we haven't had the time to dig into this ourselves properly,
> > however, your description perfectly matches with my preliminary
> > conclusion for one week ago:
> >> the dangling pointer is in **all** bug reports known so far in the
> > daemon pointer
> > (https://github.com/pi-hole/FTL/issues/805#issuecomment-643157367)
> >
> > The addition of TCP forks being able to add cache replies in dnsmasq
> > v2.81 is what is causing this code (which has been like this for along
> > time) to become problematic. I recently looked at the same code but
> > missed the chance that the byte may be still in the pipe. This is a
> > very good catch!
> >
> > I'm writing this mail to raise awareness that this appears to be a high
> > severity bug as it can easily lead to anything between indeterminable
> > behavior to fatal failures (typically the case). Even when the bug can
> > only be triggered under certain conditions, it recently caused over 200
> > posts on the Pi-hole issue ticket system (Github). Crashes due to this
> > have been reported independently by several (more than 20) individual
> > users.
> > Best
> > Dominik
> >
> > On Wed, 2020-06-17 at 19:52 -0700, Frank wrote:
> >> Hello,
> >>
> >> This patch fixes a buffer overflow in TCP requests. Since the read is
> >> not actually being retried, the byte written by the child can be left
> >> in the pipe. When that happens, cache_recv_insert() reads the length
> >> of the name, which is now multiplied by 256 due to the extra 0 byte
> >> (8 bit shift) and results in daemon->namebuff being overflowed.
> >> Namebuff is immediately before the daemon struct in memory so it ends
> >> up corrupting the beginning of the daemon struct.
> >>
> >> diff --git a/src/dnsmasq.c b/src/dnsmasq.c
> >> index 6481de8..457dea3 100644
> >> --- a/src/dnsmasq.c
> >> +++ b/src/dnsmasq.c
> >> @@ -1887,7 +1887,7 @@ static void check_dns_listeners(time_t now)
> >>single byte comes back up the pipe, which
> >>is sent by the child after it has closed
> >> the
> >>netlink socket. */
> >> -   retry_send(read(pipefd[0], , 1));
> >> +   while (retry_send(read(pipefd[0], , 1)));
> >>  #endif
> >> break;
> >>   }
> >> @@ -1928,7 +1928,7 @@ static void check_dns_listeners(time_t now)
> >>  #ifdef HAVE_LINUX_NETWORK
> >>   /* See comment above re netlink socket. */
> >>   close(daemon->netlinkfd);
> >> - retry_send(write(pipefd[1], , 1));
> >> + while (retry_send(write(pipefd[1], , 1)));
> >>  #endif
> >>   

Re: [Dnsmasq-discuss] [PATCH] Fix buffer overflow in TCP requests

2020-06-28 Thread Simon Kelley
That's a nasty bug, and could explain what pi-hole users are seeing.

If I understand things correctly, this bug will only manifest itself
when the write() or read() syscalls return EINTR ir EAGAIN, which is
possible, but not common, hence the bugs wasn't detected earlier.

Frank, did you find a way to reproduce this on demand, or just capture
sporadic instances?

I've applied Frank's patch, but immediately superseded it with a more
general clean-up which should have the same effect. I've tagged a
2.82test1 release with this. If that fixes the problem for pi-hole, I
plan to go to 2.82 fairly quickly: I don't want this hanging around.

It's slightly ironic that the original change in 2.81 is to fix a
theoretical hang which I've never actually seen and cannot reproduce,
but it introduced a really crash bug.


Cheers,

Simon.




On 18/06/2020 22:42, Dominik wrote:
> Hey Frank, dear list-members,
> 
> thanks for your proposed fix. The Pi-hole team adopted dnsmasq v2.81
> early on and we're seeing reports for mysterious crashes scattered all
> over the dnsmasq code since the very first days of releasing our latest
> version. Crashes reports show several locations, e.g., in_zone(),
> iface_check() and even main() in a few places. TCP forks were always
> happening sometime (not necessarily immediately) before a crash.
> Running dnsmasq in debug mode (prevents forking) stopped any crashes.
> 
> So far, we haven't had the time to dig into this ourselves properly,
> however, your description perfectly matches with my preliminary
> conclusion for one week ago:
>> the dangling pointer is in **all** bug reports known so far in the
> daemon pointer
> (https://github.com/pi-hole/FTL/issues/805#issuecomment-643157367)
> 
> The addition of TCP forks being able to add cache replies in dnsmasq
> v2.81 is what is causing this code (which has been like this for along
> time) to become problematic. I recently looked at the same code but
> missed the chance that the byte may be still in the pipe. This is a
> very good catch!
> 
> I'm writing this mail to raise awareness that this appears to be a high
> severity bug as it can easily lead to anything between indeterminable
> behavior to fatal failures (typically the case). Even when the bug can
> only be triggered under certain conditions, it recently caused over 200
> posts on the Pi-hole issue ticket system (Github). Crashes due to this
> have been reported independently by several (more than 20) individual
> users.
> Best
> Dominik
> 
> On Wed, 2020-06-17 at 19:52 -0700, Frank wrote:
>> Hello,
>>
>> This patch fixes a buffer overflow in TCP requests. Since the read is
>> not actually being retried, the byte written by the child can be left
>> in the pipe. When that happens, cache_recv_insert() reads the length
>> of the name, which is now multiplied by 256 due to the extra 0 byte
>> (8 bit shift) and results in daemon->namebuff being overflowed.
>> Namebuff is immediately before the daemon struct in memory so it ends
>> up corrupting the beginning of the daemon struct.
>>
>> diff --git a/src/dnsmasq.c b/src/dnsmasq.c
>> index 6481de8..457dea3 100644
>> --- a/src/dnsmasq.c
>> +++ b/src/dnsmasq.c
>> @@ -1887,7 +1887,7 @@ static void check_dns_listeners(time_t now)
>>single byte comes back up the pipe, which
>>is sent by the child after it has closed
>> the
>>netlink socket. */
>> -   retry_send(read(pipefd[0], , 1));
>> +   while (retry_send(read(pipefd[0], , 1)));
>>  #endif
>> break;
>>   }
>> @@ -1928,7 +1928,7 @@ static void check_dns_listeners(time_t now)
>>  #ifdef HAVE_LINUX_NETWORK
>>   /* See comment above re netlink socket. */
>>   close(daemon->netlinkfd);
>> - retry_send(write(pipefd[1], , 1));
>> + while (retry_send(write(pipefd[1], , 1)));
>>  #endif
>> }
>>
>> Frank
>>
>>
>> ___
>> Dnsmasq-discuss mailing list
>> Dnsmasq-discuss@lists.thekelleys.org.uk
>> http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss
> 
> 
> ___
> Dnsmasq-discuss mailing list
> Dnsmasq-discuss@lists.thekelleys.org.uk
> http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss
> 


___
Dnsmasq-discuss mailing list
Dnsmasq-discuss@lists.thekelleys.org.uk
http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss


Re: [Dnsmasq-discuss] [PATCH] Fix buffer overflow in TCP requests

2020-06-18 Thread Dominik
Hey Frank, dear list-members,

thanks for your proposed fix. The Pi-hole team adopted dnsmasq v2.81
early on and we're seeing reports for mysterious crashes scattered all
over the dnsmasq code since the very first days of releasing our latest
version. Crashes reports show several locations, e.g., in_zone(),
iface_check() and even main() in a few places. TCP forks were always
happening sometime (not necessarily immediately) before a crash.
Running dnsmasq in debug mode (prevents forking) stopped any crashes.

So far, we haven't had the time to dig into this ourselves properly,
however, your description perfectly matches with my preliminary
conclusion for one week ago:
> the dangling pointer is in **all** bug reports known so far in the
daemon pointer
(https://github.com/pi-hole/FTL/issues/805#issuecomment-643157367)

The addition of TCP forks being able to add cache replies in dnsmasq
v2.81 is what is causing this code (which has been like this for along
time) to become problematic. I recently looked at the same code but
missed the chance that the byte may be still in the pipe. This is a
very good catch!

I'm writing this mail to raise awareness that this appears to be a high
severity bug as it can easily lead to anything between indeterminable
behavior to fatal failures (typically the case). Even when the bug can
only be triggered under certain conditions, it recently caused over 200
posts on the Pi-hole issue ticket system (Github). Crashes due to this
have been reported independently by several (more than 20) individual
users.
Best
Dominik

On Wed, 2020-06-17 at 19:52 -0700, Frank wrote:
> Hello,
> 
> This patch fixes a buffer overflow in TCP requests. Since the read is
> not actually being retried, the byte written by the child can be left
> in the pipe. When that happens, cache_recv_insert() reads the length
> of the name, which is now multiplied by 256 due to the extra 0 byte
> (8 bit shift) and results in daemon->namebuff being overflowed.
> Namebuff is immediately before the daemon struct in memory so it ends
> up corrupting the beginning of the daemon struct.
> 
> diff --git a/src/dnsmasq.c b/src/dnsmasq.c
> index 6481de8..457dea3 100644
> --- a/src/dnsmasq.c
> +++ b/src/dnsmasq.c
> @@ -1887,7 +1887,7 @@ static void check_dns_listeners(time_t now)
>single byte comes back up the pipe, which
>is sent by the child after it has closed
> the
>netlink socket. */
> -   retry_send(read(pipefd[0], , 1));
> +   while (retry_send(read(pipefd[0], , 1)));
>  #endif
> break;
>   }
> @@ -1928,7 +1928,7 @@ static void check_dns_listeners(time_t now)
>  #ifdef HAVE_LINUX_NETWORK
>   /* See comment above re netlink socket. */
>   close(daemon->netlinkfd);
> - retry_send(write(pipefd[1], , 1));
> + while (retry_send(write(pipefd[1], , 1)));
>  #endif
> }
> 
> Frank
> 
> 
> ___
> Dnsmasq-discuss mailing list
> Dnsmasq-discuss@lists.thekelleys.org.uk
> http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss


___
Dnsmasq-discuss mailing list
Dnsmasq-discuss@lists.thekelleys.org.uk
http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss