Re: [Dnsmasq-discuss] dnsmasq using 100% cpu on router
Thanks Simon. In your earlier message you said you thought this is probably dhcp related. I did manage to retrieve some logs from the time of the problem and there was a great deal of dhcp happening on the network at the time. I haven't had time to go over them yet but I can see repeated dhcp requests from the same clients over and over again and often only a few minutes (or less) apart. Our network is only lightly loaded at the moment and I can't reproduce the problem on any client. Does this sound like the same bug? Would the logs be useful to you? Cheers David On 28 April 2014 18:36, Simon Kelley wrote: > Note that this bug appears to be a hard lockup. > > https://bugs.launchpad.net/ubuntu/+source/dnsmasq/+bug/1313393 > > investigations are continuing. > > > Simon. > > > > > On 28/04/14 12:18, Kevin Darbyshire-Bryant wrote: > > On 25/04/2014 09:37, David Joslin wrote: > >> Hi Kevin and thanks for the help. > >> > > Apologies for delay in reply. > >> Is it possible to upgrade the dnsmasq version on the router without > >> waiting for the author of the tomato firmware to include a later > >> version in a release of his firmware (and you mentioned that dnsmasq > >> in tomato isn't a clean pull of Simon's release)? > > Probably, but as you'd have to cross compile it to MIPS and 'Tomato' > > environment you might as well try to rebuild the entire firmware. I > > loosely 'maintain' a shadow of Simon's git repo of dnsmasq with the > > Tomato/Asuswrt tweaks here > > https://github.com/kdarbyshirebryant/dnsmasq - No guarantees etc etc, > > but I personally try to keep up to date with both 'Merlin's > > Asuswrt/rmerlin and put current dnsmasq in there too. > >> > >> Why would changing the location of the leasefile to a usb stick make a > >> difference? If the issue, as Simon suggests, is caused by the constant > >> rewriting of the lease database, then wouldn't its current location > >> (which on a router would be RAM) be a faster/better option than a usb > >> stick? Or is there another possible issue here that I've missed? > > Agree, RAM should be faster but there is a finite amount of it and it's > > volatile...I quite like to store the database on something that survives > > reboots. Also, as tomato is compiled with 'no rtc', the code tries to > > minimise the number of writes to the leasefile on the basis it thinks it > > likely that flash memory is involved, so better to reduce the wear. > >> > >> The only recent change I've made to the router was the addition of a > >> usb stick as the location for the writing of system logs and bandwidth > >> and IP traffic usage logs (so that they weren't lost on a reboot). I > >> had wondered if the cause of the problem was related to the speed of > >> writing this stuff (which obviously includes dnsmasq logging) to the > >> usb stick rather than RAM. That's why I turned off dnsmasq logging at > >> one point but it didn't seem to make any difference. > >> > >> Thanks again for your help and I'll wait for your comments on the above. > > I'm not sure I've helped really. > > > > Kevin > > > > > > > > ___ > > Dnsmasq-discuss mailing list > > Dnsmasq-discuss@lists.thekelleys.org.uk > > http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss > > > > > ___ > Dnsmasq-discuss mailing list > Dnsmasq-discuss@lists.thekelleys.org.uk > http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss > ___ Dnsmasq-discuss mailing list Dnsmasq-discuss@lists.thekelleys.org.uk http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss
Re: [Dnsmasq-discuss] dnsmasq using 100% cpu on router
Note that this bug appears to be a hard lockup. https://bugs.launchpad.net/ubuntu/+source/dnsmasq/+bug/1313393 investigations are continuing. Simon. On 28/04/14 12:18, Kevin Darbyshire-Bryant wrote: > On 25/04/2014 09:37, David Joslin wrote: >> Hi Kevin and thanks for the help. >> > Apologies for delay in reply. >> Is it possible to upgrade the dnsmasq version on the router without >> waiting for the author of the tomato firmware to include a later >> version in a release of his firmware (and you mentioned that dnsmasq >> in tomato isn't a clean pull of Simon's release)? > Probably, but as you'd have to cross compile it to MIPS and 'Tomato' > environment you might as well try to rebuild the entire firmware. I > loosely 'maintain' a shadow of Simon's git repo of dnsmasq with the > Tomato/Asuswrt tweaks here > https://github.com/kdarbyshirebryant/dnsmasq - No guarantees etc etc, > but I personally try to keep up to date with both 'Merlin's > Asuswrt/rmerlin and put current dnsmasq in there too. >> >> Why would changing the location of the leasefile to a usb stick make a >> difference? If the issue, as Simon suggests, is caused by the constant >> rewriting of the lease database, then wouldn't its current location >> (which on a router would be RAM) be a faster/better option than a usb >> stick? Or is there another possible issue here that I've missed? > Agree, RAM should be faster but there is a finite amount of it and it's > volatile...I quite like to store the database on something that survives > reboots. Also, as tomato is compiled with 'no rtc', the code tries to > minimise the number of writes to the leasefile on the basis it thinks it > likely that flash memory is involved, so better to reduce the wear. >> >> The only recent change I've made to the router was the addition of a >> usb stick as the location for the writing of system logs and bandwidth >> and IP traffic usage logs (so that they weren't lost on a reboot). I >> had wondered if the cause of the problem was related to the speed of >> writing this stuff (which obviously includes dnsmasq logging) to the >> usb stick rather than RAM. That's why I turned off dnsmasq logging at >> one point but it didn't seem to make any difference. >> >> Thanks again for your help and I'll wait for your comments on the above. > I'm not sure I've helped really. > > Kevin > > > > ___ > Dnsmasq-discuss mailing list > Dnsmasq-discuss@lists.thekelleys.org.uk > http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss > ___ Dnsmasq-discuss mailing list Dnsmasq-discuss@lists.thekelleys.org.uk http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss
Re: [Dnsmasq-discuss] dnsmasq using 100% cpu on router
On 25/04/2014 09:37, David Joslin wrote: > Hi Kevin and thanks for the help. > Apologies for delay in reply. > Is it possible to upgrade the dnsmasq version on the router without > waiting for the author of the tomato firmware to include a later > version in a release of his firmware (and you mentioned that dnsmasq > in tomato isn't a clean pull of Simon's release)? Probably, but as you'd have to cross compile it to MIPS and 'Tomato' environment you might as well try to rebuild the entire firmware. I loosely 'maintain' a shadow of Simon's git repo of dnsmasq with the Tomato/Asuswrt tweaks here https://github.com/kdarbyshirebryant/dnsmasq - No guarantees etc etc, but I personally try to keep up to date with both 'Merlin's Asuswrt/rmerlin and put current dnsmasq in there too. > > Why would changing the location of the leasefile to a usb stick make a > difference? If the issue, as Simon suggests, is caused by the constant > rewriting of the lease database, then wouldn't its current location > (which on a router would be RAM) be a faster/better option than a usb > stick? Or is there another possible issue here that I've missed? Agree, RAM should be faster but there is a finite amount of it and it's volatile...I quite like to store the database on something that survives reboots. Also, as tomato is compiled with 'no rtc', the code tries to minimise the number of writes to the leasefile on the basis it thinks it likely that flash memory is involved, so better to reduce the wear. > > The only recent change I've made to the router was the addition of a > usb stick as the location for the writing of system logs and bandwidth > and IP traffic usage logs (so that they weren't lost on a reboot). I > had wondered if the cause of the problem was related to the speed of > writing this stuff (which obviously includes dnsmasq logging) to the > usb stick rather than RAM. That's why I turned off dnsmasq logging at > one point but it didn't seem to make any difference. > > Thanks again for your help and I'll wait for your comments on the above. I'm not sure I've helped really. Kevin smime.p7s Description: S/MIME Cryptographic Signature ___ Dnsmasq-discuss mailing list Dnsmasq-discuss@lists.thekelleys.org.uk http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss
Re: [Dnsmasq-discuss] dnsmasq using 100% cpu on router
Hi Kevin and thanks for the help. Is it possible to upgrade the dnsmasq version on the router without waiting for the author of the tomato firmware to include a later version in a release of his firmware (and you mentioned that dnsmasq in tomato isn't a clean pull of Simon's release)? Why would changing the location of the leasefile to a usb stick make a difference? If the issue, as Simon suggests, is caused by the constant rewriting of the lease database, then wouldn't its current location (which on a router would be RAM) be a faster/better option than a usb stick? Or is there another possible issue here that I've missed? The only recent change I've made to the router was the addition of a usb stick as the location for the writing of system logs and bandwidth and IP traffic usage logs (so that they weren't lost on a reboot). I had wondered if the cause of the problem was related to the speed of writing this stuff (which obviously includes dnsmasq logging) to the usb stick rather than RAM. That's why I turned off dnsmasq logging at one point but it didn't seem to make any difference. Thanks again for your help and I'll wait for your comments on the above. Cheers David On 24 April 2014 21:13, Kevin Darbyshire-Bryant < ke...@darbyshire-bryant.me.uk> wrote: > On 24/04/2014 20:49, Simon Kelley wrote: > > On 24/04/14 20:41, David Joslin wrote: > >> Thanks for the reply, Simon. > >> > >> DNSSEC isn't enabled. > >> > >> I wonder if the pattern of the problem gives any clues... > >> > >> As I said, on a normal day with around 40-50 clients on the network > there > >> is no problem at all with dnsmasq managing to use barely 0 - 2% of the > CPU. > >> When the problem occurred there were a little over 100 clients. Running > top > >> showed dnsmasq using 100% cpu so I restarted dnsmasq and kept an eye on > >> top. For maybe 5 or 10 minutes there was no problem, with dnsmasq using > >> very little cpu. Then dnsmasq would start to peak at maybe 20-30% for a > >> couple of seconds before dropping back. Then it would start peaking at > >> higher and higher levels before dropping back. Eventually, after running > >> for maybe half an hour it would start peaking at over 90% and staying > there > >> for longer before dropping back. At this point dns requests would become > >> very slow (and maybe time out). And then dnsmasq would hit 100% cpu and > >> would stay there. Dns requests would time out and only restarting > dnsmasq > >> would fix the problem. The pattern would then start over again. > >> > >> I may be wrong but it doesn't seem that dnsmasq is hitting a bug that > >> suddenly causes it to loop and hog the cpu until it's killed. It seems > to > >> gradually show more and more of the problem before it eventually hogs > 100% > >> cpu and has to be killed. > >> > >> If the problem was caused by dnsmasq being overloaded with requests, is > it > >> likely or possible that 50 clients could put very little load on it but > 100 > >> clients could swamp it? Also, would the problem not show itself as soon > as > >> dnsmasq was restarted rather than showing the gradual increase in peak > >> usage until it hits 100%? > > > > Logs would help. The pattern doesn't look familiar, but if I had to > > guess, I'd say that the problem is DHCP, not DNS. Every change to the > > DHCP lease database causes the file storing it to be re-written, and I > > suspect that's what's eating CPU, in disk wait. > > > > Version of dnsmasq in use would be useful, and a copy of your config (to > > me privately, if you prefer.) > > > > When dnsmasq is running at 100%, try running > > > > strace -p > > > > that will run forever, printing what syscalls are being made, you can > > ctrl-c it after a show while, which will stop strace, but not dnsmasq. > > > > > > Cheers, > > > > > > Simon > > > > > > Chaps, > > Please be aware that the dnsmasq included in tomato is not a clean > 'pull' out of Simon's release but includes some tweaks, mainly to the > lease writing code (where it outputs 'remaining leasetime' rather than > expiry time) There's also a 'helper' function that upon receipt of > SIGUSR1 (or it may be 2 I can't remember) dumps the leasefile in a > tomato specific format so that it may be read & parsed into the 'dhcp > status' page. > > Those changes were 'formalised' by me into IFDEF conditional compilation > flags when I first investigated updating dnsmasq from v2.61 to something > slightly newer which fixed the IPv6 RA flags. The original changes by > Jon Zarate were identified and re-inserted after a few false starts. I > am no 'C' coder! > > My suggestion for a start are to upgrade to dnsmasq 2.70 rather than a > test release of 2.69. Also try changing the location of the leasefile > to somewhere else e.g. a USB stick if your router supports it. > > I've not encountered anything like this but then I don't have 100 clients. > > Kevin > > > > ___ > Dnsmasq-discuss mailing list > Dnsmasq-d
Re: [Dnsmasq-discuss] dnsmasq using 100% cpu on router
On 24/04/2014 20:49, Simon Kelley wrote: > On 24/04/14 20:41, David Joslin wrote: >> Thanks for the reply, Simon. >> >> DNSSEC isn't enabled. >> >> I wonder if the pattern of the problem gives any clues... >> >> As I said, on a normal day with around 40-50 clients on the network there >> is no problem at all with dnsmasq managing to use barely 0 - 2% of the CPU. >> When the problem occurred there were a little over 100 clients. Running top >> showed dnsmasq using 100% cpu so I restarted dnsmasq and kept an eye on >> top. For maybe 5 or 10 minutes there was no problem, with dnsmasq using >> very little cpu. Then dnsmasq would start to peak at maybe 20-30% for a >> couple of seconds before dropping back. Then it would start peaking at >> higher and higher levels before dropping back. Eventually, after running >> for maybe half an hour it would start peaking at over 90% and staying there >> for longer before dropping back. At this point dns requests would become >> very slow (and maybe time out). And then dnsmasq would hit 100% cpu and >> would stay there. Dns requests would time out and only restarting dnsmasq >> would fix the problem. The pattern would then start over again. >> >> I may be wrong but it doesn't seem that dnsmasq is hitting a bug that >> suddenly causes it to loop and hog the cpu until it's killed. It seems to >> gradually show more and more of the problem before it eventually hogs 100% >> cpu and has to be killed. >> >> If the problem was caused by dnsmasq being overloaded with requests, is it >> likely or possible that 50 clients could put very little load on it but 100 >> clients could swamp it? Also, would the problem not show itself as soon as >> dnsmasq was restarted rather than showing the gradual increase in peak >> usage until it hits 100%? > > Logs would help. The pattern doesn't look familiar, but if I had to > guess, I'd say that the problem is DHCP, not DNS. Every change to the > DHCP lease database causes the file storing it to be re-written, and I > suspect that's what's eating CPU, in disk wait. > > Version of dnsmasq in use would be useful, and a copy of your config (to > me privately, if you prefer.) > > When dnsmasq is running at 100%, try running > > strace -p > > that will run forever, printing what syscalls are being made, you can > ctrl-c it after a show while, which will stop strace, but not dnsmasq. > > > Cheers, > > > Simon > > Chaps, Please be aware that the dnsmasq included in tomato is not a clean 'pull' out of Simon's release but includes some tweaks, mainly to the lease writing code (where it outputs 'remaining leasetime' rather than expiry time) There's also a 'helper' function that upon receipt of SIGUSR1 (or it may be 2 I can't remember) dumps the leasefile in a tomato specific format so that it may be read & parsed into the 'dhcp status' page. Those changes were 'formalised' by me into IFDEF conditional compilation flags when I first investigated updating dnsmasq from v2.61 to something slightly newer which fixed the IPv6 RA flags. The original changes by Jon Zarate were identified and re-inserted after a few false starts. I am no 'C' coder! My suggestion for a start are to upgrade to dnsmasq 2.70 rather than a test release of 2.69. Also try changing the location of the leasefile to somewhere else e.g. a USB stick if your router supports it. I've not encountered anything like this but then I don't have 100 clients. Kevin smime.p7s Description: S/MIME Cryptographic Signature ___ Dnsmasq-discuss mailing list Dnsmasq-discuss@lists.thekelleys.org.uk http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss
Re: [Dnsmasq-discuss] dnsmasq using 100% cpu on router
On 24/04/14 20:41, David Joslin wrote: > Thanks for the reply, Simon. > > DNSSEC isn't enabled. > > I wonder if the pattern of the problem gives any clues... > > As I said, on a normal day with around 40-50 clients on the network there > is no problem at all with dnsmasq managing to use barely 0 - 2% of the CPU. > When the problem occurred there were a little over 100 clients. Running top > showed dnsmasq using 100% cpu so I restarted dnsmasq and kept an eye on > top. For maybe 5 or 10 minutes there was no problem, with dnsmasq using > very little cpu. Then dnsmasq would start to peak at maybe 20-30% for a > couple of seconds before dropping back. Then it would start peaking at > higher and higher levels before dropping back. Eventually, after running > for maybe half an hour it would start peaking at over 90% and staying there > for longer before dropping back. At this point dns requests would become > very slow (and maybe time out). And then dnsmasq would hit 100% cpu and > would stay there. Dns requests would time out and only restarting dnsmasq > would fix the problem. The pattern would then start over again. > > I may be wrong but it doesn't seem that dnsmasq is hitting a bug that > suddenly causes it to loop and hog the cpu until it's killed. It seems to > gradually show more and more of the problem before it eventually hogs 100% > cpu and has to be killed. > > If the problem was caused by dnsmasq being overloaded with requests, is it > likely or possible that 50 clients could put very little load on it but 100 > clients could swamp it? Also, would the problem not show itself as soon as > dnsmasq was restarted rather than showing the gradual increase in peak > usage until it hits 100%? Logs would help. The pattern doesn't look familiar, but if I had to guess, I'd say that the problem is DHCP, not DNS. Every change to the DHCP lease database causes the file storing it to be re-written, and I suspect that's what's eating CPU, in disk wait. Version of dnsmasq in use would be useful, and a copy of your config (to me privately, if you prefer.) When dnsmasq is running at 100%, try running strace -p that will run forever, printing what syscalls are being made, you can ctrl-c it after a show while, which will stop strace, but not dnsmasq. Cheers, Simon > > I hope this helps. Any thoughts on this pattern? > > Cheers > > David > > > On 24 April 2014 12:41, Simon Kelley wrote: > >> On 22/04/14 20:04, David Joslin wrote: >>> Hi >>> >>> I have an Asus rt-n16 router running the Shibby version of the Tomato >>> firmware which includes dnsmasq version 2.69test3. It's in use in a >>> building that frequently has 50+ users on a wireless network and dnsmasq >>> has performed extremely well with very little load on the router. >>> >>> However, we've recently run a couple of conferences in the building and >> the >>> number of people using the wireless network has been just over 100. >> Several >>> times there have been problems resolving addresses and when I've looked >> at >>> the router dnsmasq has been using 100% cpu. Restarting dnsmasq >> temporarily >>> fixes the problem but it occurs again maybe 20 minutes later. >>> >>> I've turned off logging, increased the cache-size and the maximum number >> of >>> dhcp leases (anything I could see that might be a problem with more >> users) >>> but this hasn't fixed the problem. >>> >>> I wondered if anyone has come across anything similar or has any >>> suggestions? >>> >> >> The first thing is to try and decide which of two possible scenarios ar >> happening. The first is that you've triggered a bug in the code and >> dnsmasq is looping somewhere without ever getting back to the select() >> loop and doing actual work. The second is that it's getting so much work >> that it's running out of CPU to do it. >> >> In the first case, dnsmasq will stop working entirely. Is that >> consistent with "problems resolving addresses" or does it still >> partially work? Turning off logging is probably counter-productive here, >> the logs may have valuable clues. >> >> >> In the second case, DNSSEC is something to worry about. Do you have that >> turned on? >> >> Also, it's possible to arrive at configurations with DNS forwarding >> loops where once DNS query gets sent upstream, but somehow ends up back >> at the dnsmasq instance that originally forwarded it and then goes round >> in circles. It's quite difficult to do this without at least two dnsmasq >> instances, but it is possible. >> >> Finally, logging to a syslog daemon which does its own DNS lookups (to >> label logs from remote hosts) can create a collapse: dnsmasq will log >> several lines for each DNS query, if each of those lines generates a new >> DNS query which has to handled by dnsmasq, it all goes wrong very quickly. >> >> >> Cheers, >> >> >> Simon. >> >> >> >> ___ >> Dnsmasq-discuss mailing list >> Dnsmasq-discuss@lists.thekelleys.org.uk >> http://lists.
Re: [Dnsmasq-discuss] dnsmasq using 100% cpu on router
Thanks for the reply, Simon. DNSSEC isn't enabled. I wonder if the pattern of the problem gives any clues... As I said, on a normal day with around 40-50 clients on the network there is no problem at all with dnsmasq managing to use barely 0 - 2% of the CPU. When the problem occurred there were a little over 100 clients. Running top showed dnsmasq using 100% cpu so I restarted dnsmasq and kept an eye on top. For maybe 5 or 10 minutes there was no problem, with dnsmasq using very little cpu. Then dnsmasq would start to peak at maybe 20-30% for a couple of seconds before dropping back. Then it would start peaking at higher and higher levels before dropping back. Eventually, after running for maybe half an hour it would start peaking at over 90% and staying there for longer before dropping back. At this point dns requests would become very slow (and maybe time out). And then dnsmasq would hit 100% cpu and would stay there. Dns requests would time out and only restarting dnsmasq would fix the problem. The pattern would then start over again. I may be wrong but it doesn't seem that dnsmasq is hitting a bug that suddenly causes it to loop and hog the cpu until it's killed. It seems to gradually show more and more of the problem before it eventually hogs 100% cpu and has to be killed. If the problem was caused by dnsmasq being overloaded with requests, is it likely or possible that 50 clients could put very little load on it but 100 clients could swamp it? Also, would the problem not show itself as soon as dnsmasq was restarted rather than showing the gradual increase in peak usage until it hits 100%? I hope this helps. Any thoughts on this pattern? Cheers David On 24 April 2014 12:41, Simon Kelley wrote: > On 22/04/14 20:04, David Joslin wrote: > > Hi > > > > I have an Asus rt-n16 router running the Shibby version of the Tomato > > firmware which includes dnsmasq version 2.69test3. It's in use in a > > building that frequently has 50+ users on a wireless network and dnsmasq > > has performed extremely well with very little load on the router. > > > > However, we've recently run a couple of conferences in the building and > the > > number of people using the wireless network has been just over 100. > Several > > times there have been problems resolving addresses and when I've looked > at > > the router dnsmasq has been using 100% cpu. Restarting dnsmasq > temporarily > > fixes the problem but it occurs again maybe 20 minutes later. > > > > I've turned off logging, increased the cache-size and the maximum number > of > > dhcp leases (anything I could see that might be a problem with more > users) > > but this hasn't fixed the problem. > > > > I wondered if anyone has come across anything similar or has any > > suggestions? > > > > The first thing is to try and decide which of two possible scenarios ar > happening. The first is that you've triggered a bug in the code and > dnsmasq is looping somewhere without ever getting back to the select() > loop and doing actual work. The second is that it's getting so much work > that it's running out of CPU to do it. > > In the first case, dnsmasq will stop working entirely. Is that > consistent with "problems resolving addresses" or does it still > partially work? Turning off logging is probably counter-productive here, > the logs may have valuable clues. > > > In the second case, DNSSEC is something to worry about. Do you have that > turned on? > > Also, it's possible to arrive at configurations with DNS forwarding > loops where once DNS query gets sent upstream, but somehow ends up back > at the dnsmasq instance that originally forwarded it and then goes round > in circles. It's quite difficult to do this without at least two dnsmasq > instances, but it is possible. > > Finally, logging to a syslog daemon which does its own DNS lookups (to > label logs from remote hosts) can create a collapse: dnsmasq will log > several lines for each DNS query, if each of those lines generates a new > DNS query which has to handled by dnsmasq, it all goes wrong very quickly. > > > Cheers, > > > Simon. > > > > ___ > Dnsmasq-discuss mailing list > Dnsmasq-discuss@lists.thekelleys.org.uk > http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss > ___ Dnsmasq-discuss mailing list Dnsmasq-discuss@lists.thekelleys.org.uk http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss
Re: [Dnsmasq-discuss] dnsmasq using 100% cpu on router
The first thing is to try and decide which of two possible scenarios ar happening. The first is that you've triggered a bug in the code and dnsmasq is looping somewhere without ever getting back to the select() loop and doing actual work. The second is that it's getting so much work that it's running out of CPU to do it. In the first case, dnsmasq will stop working entirely. Is that consistent with "problems resolving addresses" or does it still partially work? Turning off logging is probably counter-productive here, the logs may have valuable clues. And if indeed the dnsmasq process is simply being inundated then presumably its socket(s) will start overflowing which should trigger a netstat somewhere. For the DNS portion that would be something in netstat -s I would think, the UDP section. Knowing how much of this 100% CPU time is user space versus system/kernel would be goodness, as might a system call trace (eg strace) happy benchmarking, rick jones ___ Dnsmasq-discuss mailing list Dnsmasq-discuss@lists.thekelleys.org.uk http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss
Re: [Dnsmasq-discuss] dnsmasq using 100% cpu on router
On 22/04/14 20:04, David Joslin wrote: > Hi > > I have an Asus rt-n16 router running the Shibby version of the Tomato > firmware which includes dnsmasq version 2.69test3. It's in use in a > building that frequently has 50+ users on a wireless network and dnsmasq > has performed extremely well with very little load on the router. > > However, we've recently run a couple of conferences in the building and the > number of people using the wireless network has been just over 100. Several > times there have been problems resolving addresses and when I've looked at > the router dnsmasq has been using 100% cpu. Restarting dnsmasq temporarily > fixes the problem but it occurs again maybe 20 minutes later. > > I've turned off logging, increased the cache-size and the maximum number of > dhcp leases (anything I could see that might be a problem with more users) > but this hasn't fixed the problem. > > I wondered if anyone has come across anything similar or has any > suggestions? > The first thing is to try and decide which of two possible scenarios ar happening. The first is that you've triggered a bug in the code and dnsmasq is looping somewhere without ever getting back to the select() loop and doing actual work. The second is that it's getting so much work that it's running out of CPU to do it. In the first case, dnsmasq will stop working entirely. Is that consistent with "problems resolving addresses" or does it still partially work? Turning off logging is probably counter-productive here, the logs may have valuable clues. In the second case, DNSSEC is something to worry about. Do you have that turned on? Also, it's possible to arrive at configurations with DNS forwarding loops where once DNS query gets sent upstream, but somehow ends up back at the dnsmasq instance that originally forwarded it and then goes round in circles. It's quite difficult to do this without at least two dnsmasq instances, but it is possible. Finally, logging to a syslog daemon which does its own DNS lookups (to label logs from remote hosts) can create a collapse: dnsmasq will log several lines for each DNS query, if each of those lines generates a new DNS query which has to handled by dnsmasq, it all goes wrong very quickly. Cheers, Simon. ___ Dnsmasq-discuss mailing list Dnsmasq-discuss@lists.thekelleys.org.uk http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss
Re: [Dnsmasq-discuss] dnsmasq using 100% cpu on router
The router isn't being used for wi-fi. We have a Ubiquiti Unifi wi-fi system throughout the building. The router is just routing (and providing dns, dhcp etc). David On 23 April 2014 02:43, Weedy wrote: > > On 22 Apr 2014 15:10, "David Joslin" wrote: > > > > Hi > > > > I have an Asus rt-n16 router running the Shibby version of the Tomato > firmware which includes dnsmasq version 2.69test3. It's in use in a > building that frequently has 50+ users on a wireless network and dnsmasq > has performed extremely well with very little load on the router. > > > > However, we've recently run a couple of conferences in the building and > the number of people using the wireless network has been just over 100. > > Even if you fix this you should look into better hardware. > > 480mhz and broadcom radios at your loads worries the hell out of me. > ___ Dnsmasq-discuss mailing list Dnsmasq-discuss@lists.thekelleys.org.uk http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss
Re: [Dnsmasq-discuss] dnsmasq using 100% cpu on router
On 22 Apr 2014 15:10, "David Joslin" wrote: > > Hi > > I have an Asus rt-n16 router running the Shibby version of the Tomato firmware which includes dnsmasq version 2.69test3. It's in use in a building that frequently has 50+ users on a wireless network and dnsmasq has performed extremely well with very little load on the router. > > However, we've recently run a couple of conferences in the building and the number of people using the wireless network has been just over 100. Even if you fix this you should look into better hardware. 480mhz and broadcom radios at your loads worries the hell out of me. ___ Dnsmasq-discuss mailing list Dnsmasq-discuss@lists.thekelleys.org.uk http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss
Re: [Dnsmasq-discuss] dnsmasq using 100% cpu on router
On 22/04/14 03:04 PM, David Joslin wrote: > Hi > > I have an Asus rt-n16 router running the Shibby version of the Tomato > firmware which includes dnsmasq version 2.69test3. It's in use in a > building that frequently has 50+ users on a wireless network and dnsmasq > has performed extremely well with very little load on the router. > > However, we've recently run a couple of conferences in the building and the > number of people using the wireless network has been just over 100. Several > times there have been problems resolving addresses and when I've looked at > the router dnsmasq has been using 100% cpu. Restarting dnsmasq temporarily > fixes the problem but it occurs again maybe 20 minutes later. > > I've turned off logging, increased the cache-size and the maximum number of > dhcp leases (anything I could see that might be a problem with more users) > but this hasn't fixed the problem. > > I wondered if anyone has come across anything similar or has any > suggestions? > > Thanks > > David > > > > ___ > Dnsmasq-discuss mailing list > Dnsmasq-discuss@lists.thekelleys.org.uk > http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss > dnssec signature.asc Description: OpenPGP digital signature ___ Dnsmasq-discuss mailing list Dnsmasq-discuss@lists.thekelleys.org.uk http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss
[Dnsmasq-discuss] dnsmasq using 100% cpu on router
Hi I have an Asus rt-n16 router running the Shibby version of the Tomato firmware which includes dnsmasq version 2.69test3. It's in use in a building that frequently has 50+ users on a wireless network and dnsmasq has performed extremely well with very little load on the router. However, we've recently run a couple of conferences in the building and the number of people using the wireless network has been just over 100. Several times there have been problems resolving addresses and when I've looked at the router dnsmasq has been using 100% cpu. Restarting dnsmasq temporarily fixes the problem but it occurs again maybe 20 minutes later. I've turned off logging, increased the cache-size and the maximum number of dhcp leases (anything I could see that might be a problem with more users) but this hasn't fixed the problem. I wondered if anyone has come across anything similar or has any suggestions? Thanks David ___ Dnsmasq-discuss mailing list Dnsmasq-discuss@lists.thekelleys.org.uk http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss