Re: [Dnsmasq-discuss] dnsmasq stops receiving packets after network restart

2018-09-27 Thread Kristian Evensen
Hi,

On Thu, Sep 27, 2018 at 9:53 PM Simon Kelley  wrote:
> Progress. AFAIK, the dnsmasq behaviour around this has not changed at al
> in that time period. I think it's likely that the change is in the
> OpenWRT network infrastructure, maybe hotplug/coldplug stuff that now
> destroys and re-creates the kernel-level network device, rather than
> just reloading its configuration.
>
> I run the bleeding edge dnsmasq code (we suffer so you don't have too!)
> on an old, stable Chaos-calmer OpenWRT install, and I'm not seeing this
> effect, which adds weight to the theory that the change is elsewhere.

Yes, I agree. I also haven't seen this error up until recently, so
there is something else that has broken. I will try to dig a bit when
or if I have time, and see if I can discover something.

> Dnsmasq is quite clever at handling changes in kernel network level
> devices under its feet, maybe there's a way to re-bind when that
> happens? I'll have a look. A configuration option  would be the last
> resort here: adding "pull this lever to make it work" options is
> something I try and avoid.

I agree here as well. I checked if there was a socket event we were
missing, but at least no event was received on my boxes. I guess the
most elegant approach would be to monitor RTNLGRP_LINK for DELLINK,
and close the socket when DELLINK arrives. The socket could then be
recreated on NEWLINK, or, proably even better, NEWADDR.

BR,
Kristian

___
Dnsmasq-discuss mailing list
Dnsmasq-discuss@lists.thekelleys.org.uk
http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss


Re: [Dnsmasq-discuss] dnsmasq stops receiving packets after network restart

2018-09-27 Thread Simon Kelley
On 27/09/18 14:42, Kristian Evensen wrote:
> Hi Simon,
> 
> On Wed, Sep 26, 2018 at 7:30 PM Simon Kelley  wrote:
>> Simplest test is to make whichdevice always return NULL, and see if that
>> helps.
> 
> Making whichdevice() always return NULL makes the issue go away.
> Without the change, DHCP after a network restart (which triggers
> recreating devices) only works after I manually restart dnsmasq. With
> the change, DHCP works fine. Chainging dnsmasq to use two interfaces
> also makes the issue disappear. I unfortunately do not know what has
> suddenly triggered this error. I see that the code in whichdevice() is
> from 2012/2013, so it must be something in a different component.


Progress. AFAIK, the dnsmasq behaviour around this has not changed at al
in that time period. I think it's likely that the change is in the
OpenWRT network infrastructure, maybe hotplug/coldplug stuff that now
destroys and re-creates the kernel-level network device, rather than
just reloading its configuration.

I run the bleeding edge dnsmasq code (we suffer so you don't have too!)
on an old, stable Chaos-calmer OpenWRT install, and I'm not seeing this
effect, which adds weight to the theory that the change is elsewhere.
> 
> Carrying a local patch is no problem for me, but I guess a generic
> solution is desirable. Would a patch adding a configuration option be
> acceptable?
> 

Dnsmasq is quite clever at handling changes in kernel network level
devices under its feet, maybe there's a way to re-bind when that
happens? I'll have a look. A configuration option  would be the last
resort here: adding "pull this lever to make it work" options is
something I try and avoid.



Cheers,

Simon.

___
Dnsmasq-discuss mailing list
Dnsmasq-discuss@lists.thekelleys.org.uk
http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss


Re: [Dnsmasq-discuss] dnsmasq stops receiving packets after network restart

2018-09-27 Thread Kristian Evensen
Hi Simon,

On Wed, Sep 26, 2018 at 7:30 PM Simon Kelley  wrote:
> Simplest test is to make whichdevice always return NULL, and see if that
> helps.

Making whichdevice() always return NULL makes the issue go away.
Without the change, DHCP after a network restart (which triggers
recreating devices) only works after I manually restart dnsmasq. With
the change, DHCP works fine. Chainging dnsmasq to use two interfaces
also makes the issue disappear. I unfortunately do not know what has
suddenly triggered this error. I see that the code in whichdevice() is
from 2012/2013, so it must be something in a different component.

Carrying a local patch is no problem for me, but I guess a generic
solution is desirable. Would a patch adding a configuration option be
acceptable?

BR,
Kristian

___
Dnsmasq-discuss mailing list
Dnsmasq-discuss@lists.thekelleys.org.uk
http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss


Re: [Dnsmasq-discuss] dnsmasq stops receiving packets after network restart

2018-09-26 Thread Simon Kelley
On 24/09/18 19:12, Kristian Evensen wrote:
> Hello,
> 
> I have some routers running OpenWRT (latest nightly) and that I have
> to access remotely (using reverse SSH). When I restart networking
> (/etc/init.d/network restart), clients on the LAN can no longer obtain
> an IP address using DHCP. If I restart networking locally, DHCP works
> as expected after the network is back up.
> 
> In order to try and figure out what is going on, I have checked/tried
> the following:
> 
> * I started out by checking if dnsmasq has been restarted and if the
> DHCP socket has been created. I can always see the socket in netstat.
> * I then took a look at the firewall. I can see the DHCP packets in
> the INPUT chain in filter, which according to my understanding of
> Netfilter-internals is the last stop before a packet is delivered to a
> socket.
> * I then instrumented dnsmasq and added some logging in dhcp_packet()
> in dhcp.c. This function is never called, as none of my log-messages
> are written to syslog. I checked that the logging works by checking
> for my messages when DHCP is working.
> * Restarting dnsmasq makes DHCP work again. I can't see any difference
> in for example netstat-output.
> 
> Does anyone have any idea on what to try or where to look next? After
> having spent a couple of days on this issue, I am quickly starting to
> run out of ideas.
> 

I wonder if this is caused by dnsmasq using the BINDTODEVICE sockopt on
the DHCP socket. If the networking restart takes down and re-creates the
network interface, then that socket may be remain bound to the old
interface.

This comment in whichdevice() in dhcp-common.c decribes the condition
under which the binding happens.

 /* If we are doing DHCP on exactly one interface, and running linux, do
SO_BINDTODEVICE
 to that device. This is for the use case of  (eg) OpenStack, which
runs a new
 dnsmasq instance for each VLAN interface it creates. Without the
BINDTODEVICE,
 individual processes don't always see the packets they should.
 SO_BINDTODEVICE is only available Linux.

 Note that if wildcards are used in --interface, or --interface is
not used at all,
 or a configured interface doesn't yet exist, then more interfaces
may arrive later,
 so we can't safely assert there is only one interface and proceed.
*/

Simplest test is to make whichdevice always return NULL, and see if that
helps.


Cheers,

Simon.


___
Dnsmasq-discuss mailing list
Dnsmasq-discuss@lists.thekelleys.org.uk
http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss