Re: [Dnsmasq-discuss] dnsmasq using 100% of cpu

2024-05-06 Thread Kenneth Berland
I think this was my fault. I had two processes running. One from systemd (I
hate you, systemd) and one in the foreground. I haven't been able to
reproduce it with a single process.

-KB

On Mon, May 6, 2024 at 2:51 AM Simon Kelley  wrote:

> Very suspicious of listen-address=127.0.0.9. Are you sure you've not
> created a loop where dnsmasq is send queries back to itself?
>
> Enabling logging, and/or --dns-loop-detect would be useful.
>
>
> Cheers, Simon.
>
> On 5/1/24 23:47, Kenneth Berland wrote:
> > On March 2, 2020 (possibly causing the Pandemic?), there was a thread
> > with this name that went unresolved. I'm facing the same issue with
> > dnsmasq-2.90 and the following configuration. After about 10 minutes,
> > dnsmasq starts to consume 100% of the CPU. I'm running like this:
> >
> > $ wget https://thekelleys.org.uk/dnsmasq/dnsmasq-2.90.tar.xz
> > 
> > $ tar -xf dnsmasq-2.90.tar.xz
> > $ cd dnsmasq-2.90
> > $ make
> > $ sudo ./src/dnsmasq -k
> > $ cat /etc/dnsmasq.conf | grep -v ^# | awk NF
> > address=/run.app/199.36.153.11 
> > bind-interfaces
> > listen-address=127.0.0.9
> >
> > -KB
> >
> > ___
> > Dnsmasq-discuss mailing list
> > Dnsmasq-discuss@lists.thekelleys.org.uk
> > https://lists.thekelleys.org.uk/cgi-bin/mailman/listinfo/dnsmasq-discuss
>
> ___
> Dnsmasq-discuss mailing list
> Dnsmasq-discuss@lists.thekelleys.org.uk
> https://lists.thekelleys.org.uk/cgi-bin/mailman/listinfo/dnsmasq-discuss
>
___
Dnsmasq-discuss mailing list
Dnsmasq-discuss@lists.thekelleys.org.uk
https://lists.thekelleys.org.uk/cgi-bin/mailman/listinfo/dnsmasq-discuss


Re: [Dnsmasq-discuss] dnsmasq using 100% of cpu

2024-05-06 Thread Simon Kelley
Very suspicious of listen-address=127.0.0.9. Are you sure you've not 
created a loop where dnsmasq is send queries back to itself?


Enabling logging, and/or --dns-loop-detect would be useful.


Cheers, Simon.

On 5/1/24 23:47, Kenneth Berland wrote:
On March 2, 2020 (possibly causing the Pandemic?), there was a thread 
with this name that went unresolved. I'm facing the same issue with 
dnsmasq-2.90 and the following configuration. After about 10 minutes, 
dnsmasq starts to consume 100% of the CPU. I'm running like this:


$ wget https://thekelleys.org.uk/dnsmasq/dnsmasq-2.90.tar.xz 


$ tar -xf dnsmasq-2.90.tar.xz
$ cd dnsmasq-2.90
$ make
$ sudo ./src/dnsmasq -k
$ cat /etc/dnsmasq.conf | grep -v ^# | awk NF
address=/run.app/199.36.153.11 
bind-interfaces
listen-address=127.0.0.9

-KB

___
Dnsmasq-discuss mailing list
Dnsmasq-discuss@lists.thekelleys.org.uk
https://lists.thekelleys.org.uk/cgi-bin/mailman/listinfo/dnsmasq-discuss


___
Dnsmasq-discuss mailing list
Dnsmasq-discuss@lists.thekelleys.org.uk
https://lists.thekelleys.org.uk/cgi-bin/mailman/listinfo/dnsmasq-discuss


Re: [Dnsmasq-discuss] dnsmasq using 100% of cpu

2020-03-02 Thread Simon Kelley
On 02/03/2020 22:00, Geert Stappers wrote:
> On Mon, Feb 17, 2020 at 08:32:49PM +, Simon Kelley wrote:
>>
>>
>> On 17/02/2020 13:31, Donald Sharp wrote:
>>> Running:
>>>
>>> sharpd@eva:~/dnsmasq$ /sbin/dnsmasq --version
>>> Dnsmasq version 2.80  Copyright (c) 2000-2018 Simon Kelley
>>> Compile time options: IPv6 GNU-getopt DBus i18n IDN DHCP DHCPv6 no-Lua
>>> TFTP conntrack ipset auth DNSSEC loop-detect inotify dumpfile
>>> 
>>>
>>> When I install several hundred thousand routes into the kernel and
>>> remove them( or some variation thereof ), dnsmasq eventually ends up
>>> running 100% cpu:
>>>
>>> top - 18:45:18 up 1 day,  7:44,  1 user,  load average: 2.70, 2.65, 2.34
>>> Tasks: 424 total,   3 running, 421 sleeping,   0 stopped,   0 zombie
>>> %Cpu(s): 12.1 us,  6.9 sy,  0.0 ni, 80.2 id,  0.0 wa,  0.0 hi,  0.7 si,
>>>  0.0 st
>>> MiB Mem :  32131.3 total,  19483.6 free,   6620.3 used,   6027.4 buff/cache
>>> MiB Swap:  32718.0 total,  31693.0 free,   1025.0 used.  24698.2 avail Mem
>>>
>>>     PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+
>>> COMMAND                            
>>>  293183 nobody    20   0   11040   2040   1688 R  99.7   0.0 148:48.40
>>> dnsmasq        
>>>
>>> strace output:
>>>
>>> poll([{fd=3, events=POLLIN}, {fd=4, events=POLLIN}, {fd=5,
>>> events=POLLIN}, {fd=6, events=POLLIN}, {fd=7, events=POLLIN}, {fd=8,
>>> events=POLLIN}], 6, -1) = 1 ([{fd=4, revents=POLLERR}])
>>> poll([{fd=3, events=POLLIN}, {fd=4, events=POLLIN}, {fd=5,
>>> events=POLLIN}, {fd=6, events=POLLIN}, {fd=7, events=POLLIN}, {fd=8,
> ...
>>> poll([{fd=3, events=POLLIN}, {fd=4, events=POLLIN}, {fd=5,
>>> events=POLLIN}, {fd=6, events=POLLIN}, {fd=7, events=POLLIN}, {fd=8,
>>> events=POLLIN}], 6, -1) = 1 ([{fd=4, revents=PO^Cstrace: Process 293183
>>> detached
>>>
>>> I can pretty much make this happen at will.  What can I provide to help
>>> debug this?
>>
>> The first thing I'd like to know is what file descriptor 4 is, providing
>> us with the first (say) 500 or 1000 lines of strace output would help
>> with that.
>>
>>
>>>
>>> As a side note, I was not placing these routes into the default linux
>>> routing table.  Does dnsmasq need to be paying attention to these routes?
>>>
>>
>>
>> To save typing I've just pasted a comment from the code which explains
>> why adding routes affects dnsmasq
>>
>>  /* We arrange to receive netlink multicast messages whenever the
>> network route is added.
>>  If this happens and we still have a DNS packet in the buffer,
>> we re-send it.
>>  This helps on DoD links, where frequently the packet which
>> triggers dialling is
>>  a DNS query, which then gets lost. By re-sending, we can avoid
>> the lookup
>>  failing. */
>>
>>
>> I suspect that  the solution to this is to restrict the above to the
>> "main" routing table.
>>
> 
> Matching that with "[PATCH] Ignore routes in non-main tables"
>  ( 
> http://lists.thekelleys.org.uk/pipermail/dnsmasq-discuss/2020q1/013824.html )
> 
> 


That's a good solution. Donald, if you could supply an answer to the
question about what fd 4 is, that would still be useful too.


Simon.
> 
> Regards
> Geert Stappers
> 


___
Dnsmasq-discuss mailing list
Dnsmasq-discuss@lists.thekelleys.org.uk
http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss


Re: [Dnsmasq-discuss] dnsmasq using 100% of cpu

2020-03-02 Thread Geert Stappers
On Thu, Feb 20, 2020 at 10:49:35PM +, Simon Kelley wrote:
> On 17/02/2020 14:37, Geert Stappers wrote:
> > On 17-02-2020 14:31, Donald Sharp wrote:
> > 
> >> Running:
> >>
> >> sharpd@eva:~/dnsmasq$ /sbin/dnsmasq --version
> >> Dnsmasq version 2.80  Copyright (c) 2000-2018 Simon Kelley
> > 
> > 2018,  no  short-git-hashes nor simular indicators on source version.
> > 
> > 
> >> Compile time options: IPv6 GNU-getopt DBus i18n IDN DHCP DHCPv6 no-Lua
> >> TFTP conntrack ipset auth DNSSEC loop-detect inotify dumpfile
> >> 
> >>
> >> When I install several hundred thousand routes into the kernel and
> >> remove them( or some variation thereof ), dnsmasq eventually ends up
> >> running 100% cpu:
> >>
> >> top - 18:45:18 up 1 day,  7:44,  1 user,  load average: 2.70, 2.65, 2.34
> >> Tasks: 424 total,   3 running, 421 sleeping,   0 stopped,   0 zombie
> >> %Cpu(s): 12.1 us,  6.9 sy,  0.0 ni, 80.2 id,  0.0 wa,  0.0 hi,  0.7
> >> si,  0.0 st
> >> MiB Mem :  32131.3 total,  19483.6 free,   6620.3 used,   6027.4
> >> buff/cache
> >> MiB Swap:  32718.0 total,  31693.0 free,   1025.0 used.  24698.2 avail Mem
> >>
> >>     PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+
> >> COMMAND                            
> >>  293183 nobody    20   0   11040   2040   1688 R  99.7   0.0 148:48.40
> >> dnsmasq       
> > 
> > 
> > The "CPU 100%" made me do  `git log` and a "find" on 'CPU'.  I found
> > 
> > 
> > commit df6636bff61aa53ed7ad4b34d940805193c0bc74
> > Author: Florent Fourcot 
> > Date:   Mon Feb 11 17:04:44 2019 +0100
> > 
> >     lease: prune lease as soon as expired
> >    
> >     We detected a performance issue on a dnsmasq running many dhcp sessions
> >     (more than 10 000). At the end of the day, the server was only releasing
> >     old DHCP leases but was consuming a lot of CPU.
> >    
> >     It looks like curent dhcp pruning:
> >  1) it's pruning old sessions (iterate on all current leases). It's
> >  important to note that it's only pruning session expired since more
> >  than one second
> >  2) it's looking for next lease to expire (iterate on all current leases
> >  again)
> >  3) it launchs an alarm to catch next expiration found in step 2). This
> >  value can be zero for leases just expired (but not pruned).
> >    
> >     So, for a second, dnsmasq could fall in a "prune loop" by doing:
> >  * Not pruning anything, since difftime() is not > 0
> >  * Run alarm again with zero as argument
> >    
> >     On a server with very large number of leases and releasing often
> >     sessions, that can waste a very big CPU time.
> >    
> >     Signed-off-by: Florent Fourcot 
> > 
> > 
> > 
> > 
> >>
> >> strace output:
> >>
> >> poll([{fd=3, events=POLLIN}, {fd=4, events=POLLIN}, {fd=5,
> >> events=POLLIN}, {fd=6, events=POLLIN}, {fd=7, events=POLLIN}, {fd=8,
> >> events=POLLIN}], 6, -1) = 1 ([{fd=4, revents=POLLERR}])
> >>     
> >> poll([{fd=3, events=POLLIN}, {fd=4, events=POLLIN}, {fd=5,
> >> events=POLLIN}, {fd=6, events=POLLIN}, {fd=7, events=POLLIN}, {fd=8,
> >> events=POLLIN}], 6, -1) = 1 ([{fd=4, revents=POLLERR}])
> >> poll([{fd=3, events=POLLIN}, {fd=4, events=POLLIN}, {fd=5,
> >> events=POLLIN}, {fd=6, events=POLLIN}, {fd=7, events=POLLIN}, {fd=8,
> >> events=POLLIN}], 6, -1) = 1 ([{fd=4, revents=PO^Cstrace: Process
> >> 293183 detached
> >>
> >> I can pretty much make this happen at will.  What can I provide to
> >> help debug this?
> > 
> > Start with stating how recent the source is that you are using.
> > 
> > 
> >>
> >> As a side note, I was not placing these routes into the default linux
> >> routing table.  Does dnsmasq need to be paying attention to these routes?
> > 
> > Side notes in a separate thread  please.
> > 
> > 
> >>
> >> donald
> >>
> > 
> > Regards
> > 
> > Geert Stappers
> > 
> 
> Geert, you're confusing things.

Sorry for matching  CPU load  with CPU load.


> It's perfectly clear that the process is
> running 100% CPU beacuse the poll() calls are returning an error which
> the code is not expecting and doesn't handle. It just calls poll()
> again, and because the error wasn't cleared, poll returns immediately
> again, rinse and repeat.
> 
> The solution is to handle the error (it's not obvious to me how to do
> that) or to avoid creating the error condition in the first place.
> 
> To get further, we need to know which socket is erroring. It's file
> descriptor four in the strace, but is that the netlink socket, or a DHCP
> socket or a socket used to talk DNS upstream, or DNS downstream. We
> don't know  without further information.


Geert Stappers
-- 
Silence is hard to parse

___
Dnsmasq-discuss mailing list
Dnsmasq-discuss@lists.thekelleys.org.uk
http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss


Re: [Dnsmasq-discuss] dnsmasq using 100% of cpu

2020-03-02 Thread Geert Stappers
On Mon, Feb 17, 2020 at 08:32:49PM +, Simon Kelley wrote:
> 
> 
> On 17/02/2020 13:31, Donald Sharp wrote:
> > Running:
> > 
> > sharpd@eva:~/dnsmasq$ /sbin/dnsmasq --version
> > Dnsmasq version 2.80  Copyright (c) 2000-2018 Simon Kelley
> > Compile time options: IPv6 GNU-getopt DBus i18n IDN DHCP DHCPv6 no-Lua
> > TFTP conntrack ipset auth DNSSEC loop-detect inotify dumpfile
> > 
> > 
> > When I install several hundred thousand routes into the kernel and
> > remove them( or some variation thereof ), dnsmasq eventually ends up
> > running 100% cpu:
> > 
> > top - 18:45:18 up 1 day,  7:44,  1 user,  load average: 2.70, 2.65, 2.34
> > Tasks: 424 total,   3 running, 421 sleeping,   0 stopped,   0 zombie
> > %Cpu(s): 12.1 us,  6.9 sy,  0.0 ni, 80.2 id,  0.0 wa,  0.0 hi,  0.7 si,
> >  0.0 st
> > MiB Mem :  32131.3 total,  19483.6 free,   6620.3 used,   6027.4 buff/cache
> > MiB Swap:  32718.0 total,  31693.0 free,   1025.0 used.  24698.2 avail Mem
> > 
> >     PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+
> > COMMAND                            
> >  293183 nobody    20   0   11040   2040   1688 R  99.7   0.0 148:48.40
> > dnsmasq        
> > 
> > strace output:
> > 
> > poll([{fd=3, events=POLLIN}, {fd=4, events=POLLIN}, {fd=5,
> > events=POLLIN}, {fd=6, events=POLLIN}, {fd=7, events=POLLIN}, {fd=8,
> > events=POLLIN}], 6, -1) = 1 ([{fd=4, revents=POLLERR}])
> > poll([{fd=3, events=POLLIN}, {fd=4, events=POLLIN}, {fd=5,
> > events=POLLIN}, {fd=6, events=POLLIN}, {fd=7, events=POLLIN}, {fd=8,
...
> > poll([{fd=3, events=POLLIN}, {fd=4, events=POLLIN}, {fd=5,
> > events=POLLIN}, {fd=6, events=POLLIN}, {fd=7, events=POLLIN}, {fd=8,
> > events=POLLIN}], 6, -1) = 1 ([{fd=4, revents=PO^Cstrace: Process 293183
> > detached
> > 
> > I can pretty much make this happen at will.  What can I provide to help
> > debug this?
> 
> The first thing I'd like to know is what file descriptor 4 is, providing
> us with the first (say) 500 or 1000 lines of strace output would help
> with that.
> 
> 
> > 
> > As a side note, I was not placing these routes into the default linux
> > routing table.  Does dnsmasq need to be paying attention to these routes?
> > 
> 
> 
> To save typing I've just pasted a comment from the code which explains
> why adding routes affects dnsmasq
> 
>  /* We arrange to receive netlink multicast messages whenever the
> network route is added.
>  If this happens and we still have a DNS packet in the buffer,
> we re-send it.
>  This helps on DoD links, where frequently the packet which
> triggers dialling is
>  a DNS query, which then gets lost. By re-sending, we can avoid
> the lookup
>  failing. */
> 
> 
> I suspect that  the solution to this is to restrict the above to the
> "main" routing table.
> 

Matching that with "[PATCH] Ignore routes in non-main tables"
 ( http://lists.thekelleys.org.uk/pipermail/dnsmasq-discuss/2020q1/013824.html )



Regards
Geert Stappers
-- 
Silence is hard to parse

___
Dnsmasq-discuss mailing list
Dnsmasq-discuss@lists.thekelleys.org.uk
http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss


Re: [Dnsmasq-discuss] dnsmasq using 100% of cpu

2020-02-20 Thread Simon Kelley
On 17/02/2020 14:37, Geert Stappers wrote:
> On 17-02-2020 14:31, Donald Sharp wrote:
> 
>> Running:
>>
>> sharpd@eva:~/dnsmasq$ /sbin/dnsmasq --version
>> Dnsmasq version 2.80  Copyright (c) 2000-2018 Simon Kelley
> 
> 2018,  no  short-git-hashes nor simular indicators on source version.
> 
> 
>> Compile time options: IPv6 GNU-getopt DBus i18n IDN DHCP DHCPv6 no-Lua
>> TFTP conntrack ipset auth DNSSEC loop-detect inotify dumpfile
>> 
>>
>> When I install several hundred thousand routes into the kernel and
>> remove them( or some variation thereof ), dnsmasq eventually ends up
>> running 100% cpu:
>>
>> top - 18:45:18 up 1 day,  7:44,  1 user,  load average: 2.70, 2.65, 2.34
>> Tasks: 424 total,   3 running, 421 sleeping,   0 stopped,   0 zombie
>> %Cpu(s): 12.1 us,  6.9 sy,  0.0 ni, 80.2 id,  0.0 wa,  0.0 hi,  0.7
>> si,  0.0 st
>> MiB Mem :  32131.3 total,  19483.6 free,   6620.3 used,   6027.4
>> buff/cache
>> MiB Swap:  32718.0 total,  31693.0 free,   1025.0 used.  24698.2 avail Mem
>>
>>     PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+
>> COMMAND                            
>>  293183 nobody    20   0   11040   2040   1688 R  99.7   0.0 148:48.40
>> dnsmasq       
> 
> 
> The "CPU 100%" made me do  `git log` and a "find" on 'CPU'.  I found
> 
> 
> commit df6636bff61aa53ed7ad4b34d940805193c0bc74
> Author: Florent Fourcot 
> Date:   Mon Feb 11 17:04:44 2019 +0100
> 
>     lease: prune lease as soon as expired
>    
>     We detected a performance issue on a dnsmasq running many dhcp sessions
>     (more than 10 000). At the end of the day, the server was only releasing
>     old DHCP leases but was consuming a lot of CPU.
>    
>     It looks like curent dhcp pruning:
>  1) it's pruning old sessions (iterate on all current leases). It's
>  important to note that it's only pruning session expired since more
>  than one second
>  2) it's looking for next lease to expire (iterate on all current leases
>  again)
>  3) it launchs an alarm to catch next expiration found in step 2). This
>  value can be zero for leases just expired (but not pruned).
>    
>     So, for a second, dnsmasq could fall in a "prune loop" by doing:
>  * Not pruning anything, since difftime() is not > 0
>  * Run alarm again with zero as argument
>    
>     On a server with very large number of leases and releasing often
>     sessions, that can waste a very big CPU time.
>    
>     Signed-off-by: Florent Fourcot 
> 
> 
> 
> 
>>
>> strace output:
>>
>> poll([{fd=3, events=POLLIN}, {fd=4, events=POLLIN}, {fd=5,
>> events=POLLIN}, {fd=6, events=POLLIN}, {fd=7, events=POLLIN}, {fd=8,
>> events=POLLIN}], 6, -1) = 1 ([{fd=4, revents=POLLERR}])
>>     
>> poll([{fd=3, events=POLLIN}, {fd=4, events=POLLIN}, {fd=5,
>> events=POLLIN}, {fd=6, events=POLLIN}, {fd=7, events=POLLIN}, {fd=8,
>> events=POLLIN}], 6, -1) = 1 ([{fd=4, revents=POLLERR}])
>> poll([{fd=3, events=POLLIN}, {fd=4, events=POLLIN}, {fd=5,
>> events=POLLIN}, {fd=6, events=POLLIN}, {fd=7, events=POLLIN}, {fd=8,
>> events=POLLIN}], 6, -1) = 1 ([{fd=4, revents=PO^Cstrace: Process
>> 293183 detached
>>
>> I can pretty much make this happen at will.  What can I provide to
>> help debug this?
> 
> Start with stating how recent the source is that you are using.
> 
> 
>>
>> As a side note, I was not placing these routes into the default linux
>> routing table.  Does dnsmasq need to be paying attention to these routes?
> 
> Side notes in a separate thread  please.
> 
> 
>>
>> donald
>>
> 
> Regards
> 
> Geert Stappers
> 

Geert, you're confusing things. It's perfectly clear that the process is
running 100% CPU beacuse the poll() calls are returning an error which
the code is not expecting and doesn't handle. It just calls poll()
again, and because the error wasn't cleared, poll returns immediately
again, rinse and repeat.

The solution is to handle the error (it's not obvious to me how to do
that) or to avoid creating the error condition in the first place.

To get further, we need to know which socket is erroring. It's file
descriptor four in the strace, but is that the netlink socket, or a DHCP
socket or a socket used to talk DNS upstream, or DNS downstream. We
don't know  without further information.

Simon.


___
Dnsmasq-discuss mailing list
Dnsmasq-discuss@lists.thekelleys.org.uk
http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss


Re: [Dnsmasq-discuss] dnsmasq using 100% of cpu

2020-02-17 Thread Simon Kelley


On 17/02/2020 13:31, Donald Sharp wrote:
> Running:
> 
> sharpd@eva:~/dnsmasq$ /sbin/dnsmasq --version
> Dnsmasq version 2.80  Copyright (c) 2000-2018 Simon Kelley
> Compile time options: IPv6 GNU-getopt DBus i18n IDN DHCP DHCPv6 no-Lua
> TFTP conntrack ipset auth DNSSEC loop-detect inotify dumpfile
> 
> 
> When I install several hundred thousand routes into the kernel and
> remove them( or some variation thereof ), dnsmasq eventually ends up
> running 100% cpu:
> 
> top - 18:45:18 up 1 day,  7:44,  1 user,  load average: 2.70, 2.65, 2.34
> Tasks: 424 total,   3 running, 421 sleeping,   0 stopped,   0 zombie
> %Cpu(s): 12.1 us,  6.9 sy,  0.0 ni, 80.2 id,  0.0 wa,  0.0 hi,  0.7 si,
>  0.0 st
> MiB Mem :  32131.3 total,  19483.6 free,   6620.3 used,   6027.4 buff/cache
> MiB Swap:  32718.0 total,  31693.0 free,   1025.0 used.  24698.2 avail Mem
> 
>     PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+
> COMMAND                            
>  293183 nobody    20   0   11040   2040   1688 R  99.7   0.0 148:48.40
> dnsmasq        
> 
> strace output:
> 
> poll([{fd=3, events=POLLIN}, {fd=4, events=POLLIN}, {fd=5,
> events=POLLIN}, {fd=6, events=POLLIN}, {fd=7, events=POLLIN}, {fd=8,
> events=POLLIN}], 6, -1) = 1 ([{fd=4, revents=POLLERR}])
> poll([{fd=3, events=POLLIN}, {fd=4, events=POLLIN}, {fd=5,
> events=POLLIN}, {fd=6, events=POLLIN}, {fd=7, events=POLLIN}, {fd=8,
> events=POLLIN}], 6, -1) = 1 ([{fd=4, revents=POLLERR}])
> poll([{fd=3, events=POLLIN}, {fd=4, events=POLLIN}, {fd=5,
> events=POLLIN}, {fd=6, events=POLLIN}, {fd=7, events=POLLIN}, {fd=8,
> events=POLLIN}], 6, -1) = 1 ([{fd=4, revents=POLLERR}])
> poll([{fd=3, events=POLLIN}, {fd=4, events=POLLIN}, {fd=5,
> events=POLLIN}, {fd=6, events=POLLIN}, {fd=7, events=POLLIN}, {fd=8,
> events=POLLIN}], 6, -1) = 1 ([{fd=4, revents=POLLERR}])
> poll([{fd=3, events=POLLIN}, {fd=4, events=POLLIN}, {fd=5,
> events=POLLIN}, {fd=6, events=POLLIN}, {fd=7, events=POLLIN}, {fd=8,
> events=POLLIN}], 6, -1) = 1 ([{fd=4, revents=POLLERR}])
> poll([{fd=3, events=POLLIN}, {fd=4, events=POLLIN}, {fd=5,
> events=POLLIN}, {fd=6, events=POLLIN}, {fd=7, events=POLLIN}, {fd=8,
> events=POLLIN}], 6, -1) = 1 ([{fd=4, revents=POLLERR}])
> poll([{fd=3, events=POLLIN}, {fd=4, events=POLLIN}, {fd=5,
> events=POLLIN}, {fd=6, events=POLLIN}, {fd=7, events=POLLIN}, {fd=8,
> events=POLLIN}], 6, -1) = 1 ([{fd=4, revents=POLLERR}])
> poll([{fd=3, events=POLLIN}, {fd=4, events=POLLIN}, {fd=5,
> events=POLLIN}, {fd=6, events=POLLIN}, {fd=7, events=POLLIN}, {fd=8,
> events=POLLIN}], 6, -1) = 1 ([{fd=4, revents=POLLERR}])
> poll([{fd=3, events=POLLIN}, {fd=4, events=POLLIN}, {fd=5,
> events=POLLIN}, {fd=6, events=POLLIN}, {fd=7, events=POLLIN}, {fd=8,
> events=POLLIN}], 6, -1) = 1 ([{fd=4, revents=POLLERR}])
> poll([{fd=3, events=POLLIN}, {fd=4, events=POLLIN}, {fd=5,
> events=POLLIN}, {fd=6, events=POLLIN}, {fd=7, events=POLLIN}, {fd=8,
> events=POLLIN}], 6, -1) = 1 ([{fd=4, revents=POLLERR}])
> poll([{fd=3, events=POLLIN}, {fd=4, events=POLLIN}, {fd=5,
> events=POLLIN}, {fd=6, events=POLLIN}, {fd=7, events=POLLIN}, {fd=8,
> events=POLLIN}], 6, -1) = 1 ([{fd=4, revents=POLLERR}])
> poll([{fd=3, events=POLLIN}, {fd=4, events=POLLIN}, {fd=5,
> events=POLLIN}, {fd=6, events=POLLIN}, {fd=7, events=POLLIN}, {fd=8,
> events=POLLIN}], 6, -1) = 1 ([{fd=4, revents=POLLERR}])
> poll([{fd=3, events=POLLIN}, {fd=4, events=POLLIN}, {fd=5,
> events=POLLIN}, {fd=6, events=POLLIN}, {fd=7, events=POLLIN}, {fd=8,
> events=POLLIN}], 6, -1) = 1 ([{fd=4, revents=POLLERR}])
> poll([{fd=3, events=POLLIN}, {fd=4, events=POLLIN}, {fd=5,
> events=POLLIN}, {fd=6, events=POLLIN}, {fd=7, events=POLLIN}, {fd=8,
> events=POLLIN}], 6, -1) = 1 ([{fd=4, revents=POLLERR}])
> poll([{fd=3, events=POLLIN}, {fd=4, events=POLLIN}, {fd=5,
> events=POLLIN}, {fd=6, events=POLLIN}, {fd=7, events=POLLIN}, {fd=8,
> events=POLLIN}], 6, -1) = 1 ([{fd=4, revents=POLLERR}])
> poll([{fd=3, events=POLLIN}, {fd=4, events=POLLIN}, {fd=5,
> events=POLLIN}, {fd=6, events=POLLIN}, {fd=7, events=POLLIN}, {fd=8,
> events=POLLIN}], 6, -1) = 1 ([{fd=4, revents=POLLERR}])
> poll([{fd=3, events=POLLIN}, {fd=4, events=POLLIN}, {fd=5,
> events=POLLIN}, {fd=6, events=POLLIN}, {fd=7, events=POLLIN}, {fd=8,
> events=POLLIN}], 6, -1) = 1 ([{fd=4, revents=POLLERR}])
> poll([{fd=3, events=POLLIN}, {fd=4, events=POLLIN}, {fd=5,
> events=POLLIN}, {fd=6, events=POLLIN}, {fd=7, events=POLLIN}, {fd=8,
> events=POLLIN}], 6, -1) = 1 ([{fd=4, revents=POLLERR}])
> poll([{fd=3, events=POLLIN}, {fd=4, events=POLLIN}, {fd=5,
> events=POLLIN}, {fd=6, events=POLLIN}, {fd=7, events=POLLIN}, {fd=8,
> events=POLLIN}], 6, -1) = 1 ([{fd=4, revents=POLLERR}])
> poll([{fd=3, events=POLLIN}, {fd=4, events=POLLIN}, {fd=5,
> events=POLLIN}, {fd=6, events=POLLIN}, {fd=7, events=POLLIN}, {fd=8,
> events=POLLIN}], 6, -1) = 1 ([{fd=4, revents=POLLERR}])
> poll([{fd=3, events=POLLIN}, {fd=4, events=POLLIN}, {fd=5,
> events=POLLIN}, {fd=6, events=POLLIN}, {fd=7, 

Re: [Dnsmasq-discuss] dnsmasq using 100% of cpu

2020-02-17 Thread Geert Stappers
On 17-02-2020 14:31, Donald Sharp wrote:

> Running:
>
> sharpd@eva:~/dnsmasq$ /sbin/dnsmasq --version
> Dnsmasq version 2.80  Copyright (c) 2000-2018 Simon Kelley

2018,  no  short-git-hashes nor simular indicators on source version.


> Compile time options: IPv6 GNU-getopt DBus i18n IDN DHCP DHCPv6 no-Lua
> TFTP conntrack ipset auth DNSSEC loop-detect inotify dumpfile
> 
>
> When I install several hundred thousand routes into the kernel and
> remove them( or some variation thereof ), dnsmasq eventually ends up
> running 100% cpu:
>
> top - 18:45:18 up 1 day,  7:44,  1 user,  load average: 2.70, 2.65, 2.34
> Tasks: 424 total,   3 running, 421 sleeping,   0 stopped,   0 zombie
> %Cpu(s): 12.1 us,  6.9 sy,  0.0 ni, 80.2 id,  0.0 wa,  0.0 hi,  0.7
> si,  0.0 st
> MiB Mem :  32131.3 total,  19483.6 free,   6620.3 used,   6027.4
> buff/cache
> MiB Swap:  32718.0 total,  31693.0 free,   1025.0 used.  24698.2 avail Mem
>
>     PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+
> COMMAND                            
>  293183 nobody    20   0   11040   2040   1688 R  99.7   0.0 148:48.40
> dnsmasq       


The "CPU 100%" made me do  `git log` and a "find" on 'CPU'.  I found


commit df6636bff61aa53ed7ad4b34d940805193c0bc74
Author: Florent Fourcot 
Date:   Mon Feb 11 17:04:44 2019 +0100

    lease: prune lease as soon as expired
   
    We detected a performance issue on a dnsmasq running many dhcp sessions
    (more than 10 000). At the end of the day, the server was only releasing
    old DHCP leases but was consuming a lot of CPU.
   
    It looks like curent dhcp pruning:
 1) it's pruning old sessions (iterate on all current leases). It's
 important to note that it's only pruning session expired since more
 than one second
 2) it's looking for next lease to expire (iterate on all current leases
 again)
 3) it launchs an alarm to catch next expiration found in step 2). This
 value can be zero for leases just expired (but not pruned).
   
    So, for a second, dnsmasq could fall in a "prune loop" by doing:
 * Not pruning anything, since difftime() is not > 0
 * Run alarm again with zero as argument
   
    On a server with very large number of leases and releasing often
    sessions, that can waste a very big CPU time.
   
    Signed-off-by: Florent Fourcot 




>
> strace output:
>
> poll([{fd=3, events=POLLIN}, {fd=4, events=POLLIN}, {fd=5,
> events=POLLIN}, {fd=6, events=POLLIN}, {fd=7, events=POLLIN}, {fd=8,
> events=POLLIN}], 6, -1) = 1 ([{fd=4, revents=POLLERR}])
>     
> poll([{fd=3, events=POLLIN}, {fd=4, events=POLLIN}, {fd=5,
> events=POLLIN}, {fd=6, events=POLLIN}, {fd=7, events=POLLIN}, {fd=8,
> events=POLLIN}], 6, -1) = 1 ([{fd=4, revents=POLLERR}])
> poll([{fd=3, events=POLLIN}, {fd=4, events=POLLIN}, {fd=5,
> events=POLLIN}, {fd=6, events=POLLIN}, {fd=7, events=POLLIN}, {fd=8,
> events=POLLIN}], 6, -1) = 1 ([{fd=4, revents=PO^Cstrace: Process
> 293183 detached
>
> I can pretty much make this happen at will.  What can I provide to
> help debug this?

Start with stating how recent the source is that you are using.


>
> As a side note, I was not placing these routes into the default linux
> routing table.  Does dnsmasq need to be paying attention to these routes?

Side notes in a separate thread  please.


>
> donald
>

Regards

Geert Stappers



___
Dnsmasq-discuss mailing list
Dnsmasq-discuss@lists.thekelleys.org.uk
http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss