Re: [Dnsmasq-discuss] dnsmasq using 100% cpu on router

2014-04-29 Thread David Joslin
Thanks Simon.

In your earlier message you said you thought this is probably dhcp related.
I did manage to retrieve some logs from the time of the problem and there
was a great deal of dhcp happening on the network at the time. I haven't
had time to go over them yet but I can see repeated dhcp requests from the
same clients over and over again and often only a few minutes (or less)
apart. Our network is only lightly loaded at the moment and I can't
reproduce the problem on any client. Does this sound like the same bug?
Would the logs be useful to you?

Cheers

David


On 28 April 2014 18:36, Simon Kelley  wrote:

> Note that this bug appears to be a hard lockup.
>
> https://bugs.launchpad.net/ubuntu/+source/dnsmasq/+bug/1313393
>
> investigations are continuing.
>
>
> Simon.
>
>
>
>
> On 28/04/14 12:18, Kevin Darbyshire-Bryant wrote:
> > On 25/04/2014 09:37, David Joslin wrote:
> >> Hi Kevin and thanks for the help.
> >>
> > Apologies for delay in reply.
> >> Is it possible to upgrade the dnsmasq version on the router without
> >> waiting for the author of the tomato firmware to include a later
> >> version in a release of his firmware (and you mentioned that dnsmasq
> >> in tomato isn't a clean pull of Simon's release)?
> > Probably, but as you'd have to cross compile it to MIPS and 'Tomato'
> > environment you might as well try to rebuild the entire firmware.  I
> > loosely 'maintain' a shadow of Simon's git repo of dnsmasq with the
> > Tomato/Asuswrt tweaks here
> > https://github.com/kdarbyshirebryant/dnsmasq   - No guarantees etc etc,
> > but I personally try to keep up to date with both 'Merlin's
> > Asuswrt/rmerlin and put current dnsmasq in there too.
> >>
> >> Why would changing the location of the leasefile to a usb stick make a
> >> difference? If the issue, as Simon suggests, is caused by the constant
> >> rewriting of the lease database, then wouldn't its current location
> >> (which on a router would be RAM) be a faster/better option than a usb
> >> stick? Or is there another possible issue here that I've missed?
> > Agree, RAM should be faster but there is a finite amount of it and it's
> > volatile...I quite like to store the database on something that survives
> > reboots.  Also, as tomato is compiled with 'no rtc', the code tries to
> > minimise the number of writes to the leasefile on the basis it thinks it
> > likely that flash memory is involved, so better to reduce the wear.
> >>
> >> The only recent change I've made to the router was the addition of a
> >> usb stick as the location for the writing of system logs and bandwidth
> >> and IP traffic usage logs (so that they weren't lost on a reboot). I
> >> had wondered if the cause of the problem was related to the speed of
> >> writing this stuff (which obviously includes dnsmasq logging) to the
> >> usb stick rather than RAM. That's why I turned off dnsmasq logging at
> >> one point but it didn't seem to make any difference.
> >>
> >> Thanks again for your help and I'll wait for your comments on the above.
> > I'm not sure I've helped really.
> >
> > Kevin
> >
> >
> >
> > ___
> > Dnsmasq-discuss mailing list
> > Dnsmasq-discuss@lists.thekelleys.org.uk
> > http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss
> >
>
>
> ___
> Dnsmasq-discuss mailing list
> Dnsmasq-discuss@lists.thekelleys.org.uk
> http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss
>
___
Dnsmasq-discuss mailing list
Dnsmasq-discuss@lists.thekelleys.org.uk
http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss


Re: [Dnsmasq-discuss] dnsmasq using 100% cpu on router

2014-04-28 Thread Simon Kelley
Note that this bug appears to be a hard lockup.

https://bugs.launchpad.net/ubuntu/+source/dnsmasq/+bug/1313393

investigations are continuing.


Simon.




On 28/04/14 12:18, Kevin Darbyshire-Bryant wrote:
> On 25/04/2014 09:37, David Joslin wrote:
>> Hi Kevin and thanks for the help.
>>
> Apologies for delay in reply.
>> Is it possible to upgrade the dnsmasq version on the router without
>> waiting for the author of the tomato firmware to include a later
>> version in a release of his firmware (and you mentioned that dnsmasq
>> in tomato isn't a clean pull of Simon's release)?
> Probably, but as you'd have to cross compile it to MIPS and 'Tomato'
> environment you might as well try to rebuild the entire firmware.  I
> loosely 'maintain' a shadow of Simon's git repo of dnsmasq with the
> Tomato/Asuswrt tweaks here
> https://github.com/kdarbyshirebryant/dnsmasq   - No guarantees etc etc,
> but I personally try to keep up to date with both 'Merlin's
> Asuswrt/rmerlin and put current dnsmasq in there too.
>>
>> Why would changing the location of the leasefile to a usb stick make a
>> difference? If the issue, as Simon suggests, is caused by the constant
>> rewriting of the lease database, then wouldn't its current location
>> (which on a router would be RAM) be a faster/better option than a usb
>> stick? Or is there another possible issue here that I've missed?
> Agree, RAM should be faster but there is a finite amount of it and it's
> volatile...I quite like to store the database on something that survives
> reboots.  Also, as tomato is compiled with 'no rtc', the code tries to
> minimise the number of writes to the leasefile on the basis it thinks it
> likely that flash memory is involved, so better to reduce the wear.
>>
>> The only recent change I've made to the router was the addition of a
>> usb stick as the location for the writing of system logs and bandwidth
>> and IP traffic usage logs (so that they weren't lost on a reboot). I
>> had wondered if the cause of the problem was related to the speed of
>> writing this stuff (which obviously includes dnsmasq logging) to the
>> usb stick rather than RAM. That's why I turned off dnsmasq logging at
>> one point but it didn't seem to make any difference.
>>
>> Thanks again for your help and I'll wait for your comments on the above.
> I'm not sure I've helped really.
> 
> Kevin
> 
> 
> 
> ___
> Dnsmasq-discuss mailing list
> Dnsmasq-discuss@lists.thekelleys.org.uk
> http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss
> 


___
Dnsmasq-discuss mailing list
Dnsmasq-discuss@lists.thekelleys.org.uk
http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss


Re: [Dnsmasq-discuss] dnsmasq using 100% cpu on router

2014-04-28 Thread Kevin Darbyshire-Bryant
On 25/04/2014 09:37, David Joslin wrote:
> Hi Kevin and thanks for the help.
>
Apologies for delay in reply.
> Is it possible to upgrade the dnsmasq version on the router without
> waiting for the author of the tomato firmware to include a later
> version in a release of his firmware (and you mentioned that dnsmasq
> in tomato isn't a clean pull of Simon's release)?
Probably, but as you'd have to cross compile it to MIPS and 'Tomato'
environment you might as well try to rebuild the entire firmware.  I
loosely 'maintain' a shadow of Simon's git repo of dnsmasq with the
Tomato/Asuswrt tweaks here
https://github.com/kdarbyshirebryant/dnsmasq   - No guarantees etc etc,
but I personally try to keep up to date with both 'Merlin's
Asuswrt/rmerlin and put current dnsmasq in there too.
>
> Why would changing the location of the leasefile to a usb stick make a
> difference? If the issue, as Simon suggests, is caused by the constant
> rewriting of the lease database, then wouldn't its current location
> (which on a router would be RAM) be a faster/better option than a usb
> stick? Or is there another possible issue here that I've missed?
Agree, RAM should be faster but there is a finite amount of it and it's
volatile...I quite like to store the database on something that survives
reboots.  Also, as tomato is compiled with 'no rtc', the code tries to
minimise the number of writes to the leasefile on the basis it thinks it
likely that flash memory is involved, so better to reduce the wear.
>
> The only recent change I've made to the router was the addition of a
> usb stick as the location for the writing of system logs and bandwidth
> and IP traffic usage logs (so that they weren't lost on a reboot). I
> had wondered if the cause of the problem was related to the speed of
> writing this stuff (which obviously includes dnsmasq logging) to the
> usb stick rather than RAM. That's why I turned off dnsmasq logging at
> one point but it didn't seem to make any difference.
>
> Thanks again for your help and I'll wait for your comments on the above.
I'm not sure I've helped really.

Kevin


smime.p7s
Description: S/MIME Cryptographic Signature
___
Dnsmasq-discuss mailing list
Dnsmasq-discuss@lists.thekelleys.org.uk
http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss


Re: [Dnsmasq-discuss] dnsmasq using 100% cpu on router

2014-04-25 Thread David Joslin
Hi Kevin and thanks for the help.

Is it possible to upgrade the dnsmasq version on the router without waiting
for the author of the tomato firmware to include a later version in a
release of his firmware (and you mentioned that dnsmasq in tomato isn't a
clean pull of Simon's release)?

Why would changing the location of the leasefile to a usb stick make a
difference? If the issue, as Simon suggests, is caused by the constant
rewriting of the lease database, then wouldn't its current location (which
on a router would be RAM) be a faster/better option than a usb stick? Or is
there another possible issue here that I've missed?

The only recent change I've made to the router was the addition of a usb
stick as the location for the writing of system logs and bandwidth and IP
traffic usage logs (so that they weren't lost on a reboot). I had wondered
if the cause of the problem was related to the speed of writing this stuff
(which obviously includes dnsmasq logging) to the usb stick rather than
RAM. That's why I turned off dnsmasq logging at one point but it didn't
seem to make any difference.

Thanks again for your help and I'll wait for your comments on the above.

Cheers

David




On 24 April 2014 21:13, Kevin Darbyshire-Bryant <
ke...@darbyshire-bryant.me.uk> wrote:

> On 24/04/2014 20:49, Simon Kelley wrote:
> > On 24/04/14 20:41, David Joslin wrote:
> >> Thanks for the reply, Simon.
> >>
> >> DNSSEC isn't enabled.
> >>
> >> I wonder if the pattern of the problem gives any clues...
> >>
> >> As I said, on a normal day with around 40-50 clients on the network
> there
> >> is no problem at all with dnsmasq managing to use barely 0 - 2% of the
> CPU.
> >> When the problem occurred there were a little over 100 clients. Running
> top
> >> showed dnsmasq using 100% cpu so I restarted dnsmasq and kept an eye on
> >> top. For maybe 5 or 10 minutes there was no problem, with dnsmasq using
> >> very little cpu. Then dnsmasq would start to peak at maybe 20-30% for a
> >> couple of seconds before dropping back. Then it would start peaking at
> >> higher and higher levels before dropping back. Eventually, after running
> >> for maybe half an hour it would start peaking at over 90% and staying
> there
> >> for longer before dropping back. At this point dns requests would become
> >> very slow (and maybe time out). And then dnsmasq would hit 100% cpu and
> >> would stay there. Dns requests would time out and only restarting
> dnsmasq
> >> would fix the problem. The pattern would then start over again.
> >>
> >> I may be wrong but it doesn't seem that dnsmasq is hitting a bug that
> >> suddenly causes it to loop and hog the cpu until it's killed. It seems
> to
> >> gradually show more and more of the problem before it eventually hogs
> 100%
> >> cpu and has to be killed.
> >>
> >> If the problem was caused by dnsmasq being overloaded with requests, is
> it
> >> likely or possible that 50 clients could put very little load on it but
> 100
> >> clients could swamp it? Also, would the problem not show itself as soon
> as
> >> dnsmasq was restarted rather than showing the gradual increase in peak
> >> usage until it hits 100%?
> >
> > Logs would help. The pattern doesn't look familiar, but if I had to
> > guess, I'd say that the problem is DHCP, not DNS. Every change to the
> > DHCP lease database causes the file storing it to be re-written, and I
> > suspect that's what's eating CPU, in disk wait.
> >
> > Version of dnsmasq in use would be useful, and a copy of your config (to
> > me privately, if you prefer.)
> >
> > When dnsmasq is running at 100%, try running
> >
> > strace -p 
> >
> > that will run forever, printing what syscalls are being made, you can
> > ctrl-c it after a show while, which will stop strace, but not dnsmasq.
> >
> >
> > Cheers,
> >
> >
> > Simon
> >
> >
>
> Chaps,
>
> Please be aware that the dnsmasq included in tomato is not a clean
> 'pull' out of Simon's release but includes some tweaks, mainly to the
> lease writing code (where it outputs 'remaining leasetime' rather than
> expiry time)  There's also a 'helper' function that upon receipt of
> SIGUSR1 (or it may be 2 I can't remember) dumps the leasefile in a
> tomato specific format so that it may be read & parsed into the 'dhcp
> status' page.
>
> Those changes were 'formalised' by me into IFDEF conditional compilation
> flags when I first investigated updating dnsmasq from v2.61 to something
> slightly newer which fixed the IPv6 RA flags.  The original changes by
> Jon Zarate were identified and re-inserted after a few false starts.  I
> am no 'C' coder!
>
> My suggestion for a start are to upgrade to dnsmasq 2.70 rather than a
> test release of 2.69.  Also try changing the location of the leasefile
> to somewhere else e.g. a USB stick if your router supports it.
>
> I've not encountered anything like this but then I don't have 100 clients.
>
> Kevin
>
>
>
> ___
> Dnsmasq-discuss mailing list
> Dnsmasq-d

Re: [Dnsmasq-discuss] dnsmasq using 100% cpu on router

2014-04-24 Thread Kevin Darbyshire-Bryant
On 24/04/2014 20:49, Simon Kelley wrote:
> On 24/04/14 20:41, David Joslin wrote:
>> Thanks for the reply, Simon.
>>
>> DNSSEC isn't enabled.
>>
>> I wonder if the pattern of the problem gives any clues...
>>
>> As I said, on a normal day with around 40-50 clients on the network there
>> is no problem at all with dnsmasq managing to use barely 0 - 2% of the CPU.
>> When the problem occurred there were a little over 100 clients. Running top
>> showed dnsmasq using 100% cpu so I restarted dnsmasq and kept an eye on
>> top. For maybe 5 or 10 minutes there was no problem, with dnsmasq using
>> very little cpu. Then dnsmasq would start to peak at maybe 20-30% for a
>> couple of seconds before dropping back. Then it would start peaking at
>> higher and higher levels before dropping back. Eventually, after running
>> for maybe half an hour it would start peaking at over 90% and staying there
>> for longer before dropping back. At this point dns requests would become
>> very slow (and maybe time out). And then dnsmasq would hit 100% cpu and
>> would stay there. Dns requests would time out and only restarting dnsmasq
>> would fix the problem. The pattern would then start over again.
>>
>> I may be wrong but it doesn't seem that dnsmasq is hitting a bug that
>> suddenly causes it to loop and hog the cpu until it's killed. It seems to
>> gradually show more and more of the problem before it eventually hogs 100%
>> cpu and has to be killed.
>>
>> If the problem was caused by dnsmasq being overloaded with requests, is it
>> likely or possible that 50 clients could put very little load on it but 100
>> clients could swamp it? Also, would the problem not show itself as soon as
>> dnsmasq was restarted rather than showing the gradual increase in peak
>> usage until it hits 100%?
>
> Logs would help. The pattern doesn't look familiar, but if I had to
> guess, I'd say that the problem is DHCP, not DNS. Every change to the
> DHCP lease database causes the file storing it to be re-written, and I
> suspect that's what's eating CPU, in disk wait.
>
> Version of dnsmasq in use would be useful, and a copy of your config (to
> me privately, if you prefer.)
>
> When dnsmasq is running at 100%, try running
>
> strace -p 
>
> that will run forever, printing what syscalls are being made, you can
> ctrl-c it after a show while, which will stop strace, but not dnsmasq.
>
>
> Cheers,
>
>
> Simon
>
>

Chaps,

Please be aware that the dnsmasq included in tomato is not a clean
'pull' out of Simon's release but includes some tweaks, mainly to the
lease writing code (where it outputs 'remaining leasetime' rather than
expiry time)  There's also a 'helper' function that upon receipt of
SIGUSR1 (or it may be 2 I can't remember) dumps the leasefile in a
tomato specific format so that it may be read & parsed into the 'dhcp
status' page.

Those changes were 'formalised' by me into IFDEF conditional compilation
flags when I first investigated updating dnsmasq from v2.61 to something
slightly newer which fixed the IPv6 RA flags.  The original changes by
Jon Zarate were identified and re-inserted after a few false starts.  I
am no 'C' coder!

My suggestion for a start are to upgrade to dnsmasq 2.70 rather than a
test release of 2.69.  Also try changing the location of the leasefile
to somewhere else e.g. a USB stick if your router supports it.

I've not encountered anything like this but then I don't have 100 clients.

Kevin




smime.p7s
Description: S/MIME Cryptographic Signature
___
Dnsmasq-discuss mailing list
Dnsmasq-discuss@lists.thekelleys.org.uk
http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss


Re: [Dnsmasq-discuss] dnsmasq using 100% cpu on router

2014-04-24 Thread Simon Kelley
On 24/04/14 20:41, David Joslin wrote:
> Thanks for the reply, Simon.
> 
> DNSSEC isn't enabled.
> 
> I wonder if the pattern of the problem gives any clues...
> 
> As I said, on a normal day with around 40-50 clients on the network there
> is no problem at all with dnsmasq managing to use barely 0 - 2% of the CPU.
> When the problem occurred there were a little over 100 clients. Running top
> showed dnsmasq using 100% cpu so I restarted dnsmasq and kept an eye on
> top. For maybe 5 or 10 minutes there was no problem, with dnsmasq using
> very little cpu. Then dnsmasq would start to peak at maybe 20-30% for a
> couple of seconds before dropping back. Then it would start peaking at
> higher and higher levels before dropping back. Eventually, after running
> for maybe half an hour it would start peaking at over 90% and staying there
> for longer before dropping back. At this point dns requests would become
> very slow (and maybe time out). And then dnsmasq would hit 100% cpu and
> would stay there. Dns requests would time out and only restarting dnsmasq
> would fix the problem. The pattern would then start over again.
> 
> I may be wrong but it doesn't seem that dnsmasq is hitting a bug that
> suddenly causes it to loop and hog the cpu until it's killed. It seems to
> gradually show more and more of the problem before it eventually hogs 100%
> cpu and has to be killed.
> 
> If the problem was caused by dnsmasq being overloaded with requests, is it
> likely or possible that 50 clients could put very little load on it but 100
> clients could swamp it? Also, would the problem not show itself as soon as
> dnsmasq was restarted rather than showing the gradual increase in peak
> usage until it hits 100%?


Logs would help. The pattern doesn't look familiar, but if I had to
guess, I'd say that the problem is DHCP, not DNS. Every change to the
DHCP lease database causes the file storing it to be re-written, and I
suspect that's what's eating CPU, in disk wait.

Version of dnsmasq in use would be useful, and a copy of your config (to
me privately, if you prefer.)

When dnsmasq is running at 100%, try running

strace -p 

that will run forever, printing what syscalls are being made, you can
ctrl-c it after a show while, which will stop strace, but not dnsmasq.


Cheers,


Simon

> 
> I hope this helps. Any thoughts on this pattern?
> 
> Cheers
> 
> David
> 
> 
> On 24 April 2014 12:41, Simon Kelley  wrote:
> 
>> On 22/04/14 20:04, David Joslin wrote:
>>> Hi
>>>
>>> I have an Asus rt-n16 router running the Shibby version of the Tomato
>>> firmware which includes dnsmasq version 2.69test3. It's in use in a
>>> building that frequently has 50+ users on a wireless network and dnsmasq
>>> has performed extremely well with very little load on the router.
>>>
>>> However, we've recently run a couple of conferences in the building and
>> the
>>> number of people using the wireless network has been just over 100.
>> Several
>>> times there have been problems resolving addresses and when I've looked
>> at
>>> the router dnsmasq has been using 100% cpu. Restarting dnsmasq
>> temporarily
>>> fixes the problem but it occurs again maybe 20 minutes later.
>>>
>>> I've turned off logging, increased the cache-size and the maximum number
>> of
>>> dhcp leases (anything I could see that might be a problem with more
>> users)
>>> but this hasn't fixed the problem.
>>>
>>> I wondered if anyone has come across anything similar or has any
>>> suggestions?
>>>
>>
>> The first thing is to try and decide which of two possible scenarios ar
>> happening. The first is that you've triggered a bug in the code and
>> dnsmasq is looping somewhere without ever getting back to the select()
>> loop and doing actual work. The second is that it's getting so much work
>> that it's running out of CPU to do it.
>>
>> In the first case, dnsmasq will stop working entirely. Is that
>> consistent with  "problems resolving addresses" or does it still
>> partially work? Turning off logging is probably counter-productive here,
>> the logs may have valuable clues.
>>
>>
>> In the second case, DNSSEC is something to worry about. Do you have that
>> turned on?
>>
>> Also, it's possible to arrive at configurations with DNS forwarding
>> loops where once DNS query gets sent upstream, but somehow ends up back
>> at the dnsmasq instance that originally forwarded it and then goes round
>> in circles. It's quite difficult to do this without at least two dnsmasq
>> instances, but it is possible.
>>
>> Finally, logging to a syslog daemon which does its own DNS lookups (to
>> label logs from remote hosts) can create a collapse: dnsmasq will log
>> several lines for each DNS query, if each of those lines generates a new
>> DNS query which has to handled by dnsmasq, it all goes wrong very quickly.
>>
>>
>> Cheers,
>>
>>
>> Simon.
>>
>>
>>
>> ___
>> Dnsmasq-discuss mailing list
>> Dnsmasq-discuss@lists.thekelleys.org.uk
>> http://lists.

Re: [Dnsmasq-discuss] dnsmasq using 100% cpu on router

2014-04-24 Thread David Joslin
Thanks for the reply, Simon.

DNSSEC isn't enabled.

I wonder if the pattern of the problem gives any clues...

As I said, on a normal day with around 40-50 clients on the network there
is no problem at all with dnsmasq managing to use barely 0 - 2% of the CPU.
When the problem occurred there were a little over 100 clients. Running top
showed dnsmasq using 100% cpu so I restarted dnsmasq and kept an eye on
top. For maybe 5 or 10 minutes there was no problem, with dnsmasq using
very little cpu. Then dnsmasq would start to peak at maybe 20-30% for a
couple of seconds before dropping back. Then it would start peaking at
higher and higher levels before dropping back. Eventually, after running
for maybe half an hour it would start peaking at over 90% and staying there
for longer before dropping back. At this point dns requests would become
very slow (and maybe time out). And then dnsmasq would hit 100% cpu and
would stay there. Dns requests would time out and only restarting dnsmasq
would fix the problem. The pattern would then start over again.

I may be wrong but it doesn't seem that dnsmasq is hitting a bug that
suddenly causes it to loop and hog the cpu until it's killed. It seems to
gradually show more and more of the problem before it eventually hogs 100%
cpu and has to be killed.

If the problem was caused by dnsmasq being overloaded with requests, is it
likely or possible that 50 clients could put very little load on it but 100
clients could swamp it? Also, would the problem not show itself as soon as
dnsmasq was restarted rather than showing the gradual increase in peak
usage until it hits 100%?

I hope this helps. Any thoughts on this pattern?

Cheers

David


On 24 April 2014 12:41, Simon Kelley  wrote:

> On 22/04/14 20:04, David Joslin wrote:
> > Hi
> >
> > I have an Asus rt-n16 router running the Shibby version of the Tomato
> > firmware which includes dnsmasq version 2.69test3. It's in use in a
> > building that frequently has 50+ users on a wireless network and dnsmasq
> > has performed extremely well with very little load on the router.
> >
> > However, we've recently run a couple of conferences in the building and
> the
> > number of people using the wireless network has been just over 100.
> Several
> > times there have been problems resolving addresses and when I've looked
> at
> > the router dnsmasq has been using 100% cpu. Restarting dnsmasq
> temporarily
> > fixes the problem but it occurs again maybe 20 minutes later.
> >
> > I've turned off logging, increased the cache-size and the maximum number
> of
> > dhcp leases (anything I could see that might be a problem with more
> users)
> > but this hasn't fixed the problem.
> >
> > I wondered if anyone has come across anything similar or has any
> > suggestions?
> >
>
> The first thing is to try and decide which of two possible scenarios ar
> happening. The first is that you've triggered a bug in the code and
> dnsmasq is looping somewhere without ever getting back to the select()
> loop and doing actual work. The second is that it's getting so much work
> that it's running out of CPU to do it.
>
> In the first case, dnsmasq will stop working entirely. Is that
> consistent with  "problems resolving addresses" or does it still
> partially work? Turning off logging is probably counter-productive here,
> the logs may have valuable clues.
>
>
> In the second case, DNSSEC is something to worry about. Do you have that
> turned on?
>
> Also, it's possible to arrive at configurations with DNS forwarding
> loops where once DNS query gets sent upstream, but somehow ends up back
> at the dnsmasq instance that originally forwarded it and then goes round
> in circles. It's quite difficult to do this without at least two dnsmasq
> instances, but it is possible.
>
> Finally, logging to a syslog daemon which does its own DNS lookups (to
> label logs from remote hosts) can create a collapse: dnsmasq will log
> several lines for each DNS query, if each of those lines generates a new
> DNS query which has to handled by dnsmasq, it all goes wrong very quickly.
>
>
> Cheers,
>
>
> Simon.
>
>
>
> ___
> Dnsmasq-discuss mailing list
> Dnsmasq-discuss@lists.thekelleys.org.uk
> http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss
>
___
Dnsmasq-discuss mailing list
Dnsmasq-discuss@lists.thekelleys.org.uk
http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss


Re: [Dnsmasq-discuss] dnsmasq using 100% cpu on router

2014-04-24 Thread Rick Jones


The first thing is to try and decide which of two possible scenarios ar
happening. The first is that you've triggered a bug in the code and
dnsmasq is looping somewhere without ever getting back to the select()
loop and doing actual work. The second is that it's getting so much work
that it's running out of CPU to do it.

In the first case, dnsmasq will stop working entirely. Is that
consistent with  "problems resolving addresses" or does it still
partially work? Turning off logging is probably counter-productive here,
the logs may have valuable clues.


And if indeed the dnsmasq process is simply being inundated then 
presumably its socket(s) will start overflowing which should trigger a 
netstat somewhere.  For the DNS portion that would be something in 
netstat -s I would think, the UDP section.


Knowing how much of this 100% CPU time is user space versus 
system/kernel would be goodness, as might a system call trace (eg strace)


happy benchmarking,

rick jones

___
Dnsmasq-discuss mailing list
Dnsmasq-discuss@lists.thekelleys.org.uk
http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss


Re: [Dnsmasq-discuss] dnsmasq using 100% cpu on router

2014-04-24 Thread Simon Kelley
On 22/04/14 20:04, David Joslin wrote:
> Hi
> 
> I have an Asus rt-n16 router running the Shibby version of the Tomato
> firmware which includes dnsmasq version 2.69test3. It's in use in a
> building that frequently has 50+ users on a wireless network and dnsmasq
> has performed extremely well with very little load on the router.
> 
> However, we've recently run a couple of conferences in the building and the
> number of people using the wireless network has been just over 100. Several
> times there have been problems resolving addresses and when I've looked at
> the router dnsmasq has been using 100% cpu. Restarting dnsmasq temporarily
> fixes the problem but it occurs again maybe 20 minutes later.
> 
> I've turned off logging, increased the cache-size and the maximum number of
> dhcp leases (anything I could see that might be a problem with more users)
> but this hasn't fixed the problem.
> 
> I wondered if anyone has come across anything similar or has any
> suggestions?
> 

The first thing is to try and decide which of two possible scenarios ar
happening. The first is that you've triggered a bug in the code and
dnsmasq is looping somewhere without ever getting back to the select()
loop and doing actual work. The second is that it's getting so much work
that it's running out of CPU to do it.

In the first case, dnsmasq will stop working entirely. Is that
consistent with  "problems resolving addresses" or does it still
partially work? Turning off logging is probably counter-productive here,
the logs may have valuable clues.


In the second case, DNSSEC is something to worry about. Do you have that
turned on?

Also, it's possible to arrive at configurations with DNS forwarding
loops where once DNS query gets sent upstream, but somehow ends up back
at the dnsmasq instance that originally forwarded it and then goes round
in circles. It's quite difficult to do this without at least two dnsmasq
instances, but it is possible.

Finally, logging to a syslog daemon which does its own DNS lookups (to
label logs from remote hosts) can create a collapse: dnsmasq will log
several lines for each DNS query, if each of those lines generates a new
DNS query which has to handled by dnsmasq, it all goes wrong very quickly.


Cheers,


Simon.



___
Dnsmasq-discuss mailing list
Dnsmasq-discuss@lists.thekelleys.org.uk
http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss


Re: [Dnsmasq-discuss] dnsmasq using 100% cpu on router

2014-04-23 Thread David Joslin
The router isn't being used for wi-fi. We have a Ubiquiti Unifi wi-fi
system throughout the building. The router is just routing (and providing
dns, dhcp etc).

David


On 23 April 2014 02:43, Weedy  wrote:

>
> On 22 Apr 2014 15:10, "David Joslin"  wrote:
> >
> > Hi
> >
> > I have an Asus rt-n16 router running the Shibby version of the Tomato
> firmware which includes dnsmasq version 2.69test3. It's in use in a
> building that frequently has 50+ users on a wireless network and dnsmasq
> has performed extremely well with very little load on the router.
> >
> > However, we've recently run a couple of conferences in the building and
> the number of people using the wireless network has been just over 100.
>
> Even if you fix this you should look into better hardware.
>
> 480mhz and broadcom radios at your loads worries the hell out of me.
>
___
Dnsmasq-discuss mailing list
Dnsmasq-discuss@lists.thekelleys.org.uk
http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss


Re: [Dnsmasq-discuss] dnsmasq using 100% cpu on router

2014-04-22 Thread Weedy
On 22 Apr 2014 15:10, "David Joslin"  wrote:
>
> Hi
>
> I have an Asus rt-n16 router running the Shibby version of the Tomato
firmware which includes dnsmasq version 2.69test3. It's in use in a
building that frequently has 50+ users on a wireless network and dnsmasq
has performed extremely well with very little load on the router.
>
> However, we've recently run a couple of conferences in the building and
the number of people using the wireless network has been just over 100.

Even if you fix this you should look into better hardware.

480mhz and broadcom radios at your loads worries the hell out of me.
___
Dnsmasq-discuss mailing list
Dnsmasq-discuss@lists.thekelleys.org.uk
http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss


Re: [Dnsmasq-discuss] dnsmasq using 100% cpu on router

2014-04-22 Thread Alex Xu
On 22/04/14 03:04 PM, David Joslin wrote:
> Hi
> 
> I have an Asus rt-n16 router running the Shibby version of the Tomato
> firmware which includes dnsmasq version 2.69test3. It's in use in a
> building that frequently has 50+ users on a wireless network and dnsmasq
> has performed extremely well with very little load on the router.
> 
> However, we've recently run a couple of conferences in the building and the
> number of people using the wireless network has been just over 100. Several
> times there have been problems resolving addresses and when I've looked at
> the router dnsmasq has been using 100% cpu. Restarting dnsmasq temporarily
> fixes the problem but it occurs again maybe 20 minutes later.
> 
> I've turned off logging, increased the cache-size and the maximum number of
> dhcp leases (anything I could see that might be a problem with more users)
> but this hasn't fixed the problem.
> 
> I wondered if anyone has come across anything similar or has any
> suggestions?
> 
> Thanks
> 
> David
> 
> 
> 
> ___
> Dnsmasq-discuss mailing list
> Dnsmasq-discuss@lists.thekelleys.org.uk
> http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss
> 

dnssec



signature.asc
Description: OpenPGP digital signature
___
Dnsmasq-discuss mailing list
Dnsmasq-discuss@lists.thekelleys.org.uk
http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss


[Dnsmasq-discuss] dnsmasq using 100% cpu on router

2014-04-22 Thread David Joslin
Hi

I have an Asus rt-n16 router running the Shibby version of the Tomato
firmware which includes dnsmasq version 2.69test3. It's in use in a
building that frequently has 50+ users on a wireless network and dnsmasq
has performed extremely well with very little load on the router.

However, we've recently run a couple of conferences in the building and the
number of people using the wireless network has been just over 100. Several
times there have been problems resolving addresses and when I've looked at
the router dnsmasq has been using 100% cpu. Restarting dnsmasq temporarily
fixes the problem but it occurs again maybe 20 minutes later.

I've turned off logging, increased the cache-size and the maximum number of
dhcp leases (anything I could see that might be a problem with more users)
but this hasn't fixed the problem.

I wondered if anyone has come across anything similar or has any
suggestions?

Thanks

David
___
Dnsmasq-discuss mailing list
Dnsmasq-discuss@lists.thekelleys.org.uk
http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss