Re: [Dnsmasq-discuss] dnsmasq stops receiving packets after network restart
Hi, On Thu, Sep 27, 2018 at 9:53 PM Simon Kelley wrote: > Progress. AFAIK, the dnsmasq behaviour around this has not changed at al > in that time period. I think it's likely that the change is in the > OpenWRT network infrastructure, maybe hotplug/coldplug stuff that now > destroys and re-creates the kernel-level network device, rather than > just reloading its configuration. > > I run the bleeding edge dnsmasq code (we suffer so you don't have too!) > on an old, stable Chaos-calmer OpenWRT install, and I'm not seeing this > effect, which adds weight to the theory that the change is elsewhere. Yes, I agree. I also haven't seen this error up until recently, so there is something else that has broken. I will try to dig a bit when or if I have time, and see if I can discover something. > Dnsmasq is quite clever at handling changes in kernel network level > devices under its feet, maybe there's a way to re-bind when that > happens? I'll have a look. A configuration option would be the last > resort here: adding "pull this lever to make it work" options is > something I try and avoid. I agree here as well. I checked if there was a socket event we were missing, but at least no event was received on my boxes. I guess the most elegant approach would be to monitor RTNLGRP_LINK for DELLINK, and close the socket when DELLINK arrives. The socket could then be recreated on NEWLINK, or, proably even better, NEWADDR. BR, Kristian ___ Dnsmasq-discuss mailing list Dnsmasq-discuss@lists.thekelleys.org.uk http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss
Re: [Dnsmasq-discuss] dnsmasq stops receiving packets after network restart
On 27/09/18 14:42, Kristian Evensen wrote: > Hi Simon, > > On Wed, Sep 26, 2018 at 7:30 PM Simon Kelley wrote: >> Simplest test is to make whichdevice always return NULL, and see if that >> helps. > > Making whichdevice() always return NULL makes the issue go away. > Without the change, DHCP after a network restart (which triggers > recreating devices) only works after I manually restart dnsmasq. With > the change, DHCP works fine. Chainging dnsmasq to use two interfaces > also makes the issue disappear. I unfortunately do not know what has > suddenly triggered this error. I see that the code in whichdevice() is > from 2012/2013, so it must be something in a different component. Progress. AFAIK, the dnsmasq behaviour around this has not changed at al in that time period. I think it's likely that the change is in the OpenWRT network infrastructure, maybe hotplug/coldplug stuff that now destroys and re-creates the kernel-level network device, rather than just reloading its configuration. I run the bleeding edge dnsmasq code (we suffer so you don't have too!) on an old, stable Chaos-calmer OpenWRT install, and I'm not seeing this effect, which adds weight to the theory that the change is elsewhere. > > Carrying a local patch is no problem for me, but I guess a generic > solution is desirable. Would a patch adding a configuration option be > acceptable? > Dnsmasq is quite clever at handling changes in kernel network level devices under its feet, maybe there's a way to re-bind when that happens? I'll have a look. A configuration option would be the last resort here: adding "pull this lever to make it work" options is something I try and avoid. Cheers, Simon. ___ Dnsmasq-discuss mailing list Dnsmasq-discuss@lists.thekelleys.org.uk http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss
Re: [Dnsmasq-discuss] dnsmasq stops receiving packets after network restart
Hi Simon, On Wed, Sep 26, 2018 at 7:30 PM Simon Kelley wrote: > Simplest test is to make whichdevice always return NULL, and see if that > helps. Making whichdevice() always return NULL makes the issue go away. Without the change, DHCP after a network restart (which triggers recreating devices) only works after I manually restart dnsmasq. With the change, DHCP works fine. Chainging dnsmasq to use two interfaces also makes the issue disappear. I unfortunately do not know what has suddenly triggered this error. I see that the code in whichdevice() is from 2012/2013, so it must be something in a different component. Carrying a local patch is no problem for me, but I guess a generic solution is desirable. Would a patch adding a configuration option be acceptable? BR, Kristian ___ Dnsmasq-discuss mailing list Dnsmasq-discuss@lists.thekelleys.org.uk http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss
Re: [Dnsmasq-discuss] dnsmasq stops receiving packets after network restart
On 24/09/18 19:12, Kristian Evensen wrote: > Hello, > > I have some routers running OpenWRT (latest nightly) and that I have > to access remotely (using reverse SSH). When I restart networking > (/etc/init.d/network restart), clients on the LAN can no longer obtain > an IP address using DHCP. If I restart networking locally, DHCP works > as expected after the network is back up. > > In order to try and figure out what is going on, I have checked/tried > the following: > > * I started out by checking if dnsmasq has been restarted and if the > DHCP socket has been created. I can always see the socket in netstat. > * I then took a look at the firewall. I can see the DHCP packets in > the INPUT chain in filter, which according to my understanding of > Netfilter-internals is the last stop before a packet is delivered to a > socket. > * I then instrumented dnsmasq and added some logging in dhcp_packet() > in dhcp.c. This function is never called, as none of my log-messages > are written to syslog. I checked that the logging works by checking > for my messages when DHCP is working. > * Restarting dnsmasq makes DHCP work again. I can't see any difference > in for example netstat-output. > > Does anyone have any idea on what to try or where to look next? After > having spent a couple of days on this issue, I am quickly starting to > run out of ideas. > I wonder if this is caused by dnsmasq using the BINDTODEVICE sockopt on the DHCP socket. If the networking restart takes down and re-creates the network interface, then that socket may be remain bound to the old interface. This comment in whichdevice() in dhcp-common.c decribes the condition under which the binding happens. /* If we are doing DHCP on exactly one interface, and running linux, do SO_BINDTODEVICE to that device. This is for the use case of (eg) OpenStack, which runs a new dnsmasq instance for each VLAN interface it creates. Without the BINDTODEVICE, individual processes don't always see the packets they should. SO_BINDTODEVICE is only available Linux. Note that if wildcards are used in --interface, or --interface is not used at all, or a configured interface doesn't yet exist, then more interfaces may arrive later, so we can't safely assert there is only one interface and proceed. */ Simplest test is to make whichdevice always return NULL, and see if that helps. Cheers, Simon. ___ Dnsmasq-discuss mailing list Dnsmasq-discuss@lists.thekelleys.org.uk http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss