[Shorewall-users] ARP problems with multiple ipv4 addresses on a physical interface (not aliases) [was: Packets replied to on wrong vlan]

Brian Burch Wed, 04 Nov 2015 01:28:09 -0800

On 22/10/15 00:02, Lennart Sorensen wrote:
> On Wed, Oct 21, 2015 at 06:13:03PM +1000, Brian Burch wrote:
>> I won't confuse anyone with details (in case they are off-topic), but I
>> thought it would be helpful to let you know I've been doing problem
>> determination on what appears to be a similar issue, but with a
>> different configuration.
>>
>> Naturally, I get shorewall log events for traffic between subnets that
>> should NOT be allowed, but nothing is logged for those connections that
>> are allowed. I believe it is not a shorewall problem, but something is
>> going wrong quite low in the stack.
>>
>> The details seem to be frustratingly variable, but I often see redirect
>> log messages to/from the host sending pings. I have many wireshark
>> traces from a mirror port on my switch, but haven't yet spotted the root
>> cause.
>>
>> In my research I found a reference to Linux being built on a "weak end
>> system model" as defined in RFC1122, which apparently "leads to arp
>> problems with multi-homed hosts". I haven't fully understood the
>> theoretical issues yet, so I apologise if my comments are not relevant
>> to your situation. However, in case it is relevant I thought it best to
>> mention quickly.
>
> Do you mean the arp behaviour where it will answer an arp request for
> an address it owns on another interface?  For that we use these settings
> to get sane behaviour on a router:
>
> # Do not answer ARP requests from other interfaces
> net.ipv4.conf.all.arp_ignore=1
> net.ipv4.conf.all.arp_announce=2
>


Thanks for your helpful comment, Lennart. Your suggestion wasn't the 
solution, but it caused me to investigate more thoroughly.

I have solved my problem and confirmed my theory that shorewall was not 
involved. It quickly became clear that my post was not relevant to the 
original problem, so I thought it would be useful to document it as a 
new thread with a more appropriate subject.

On reading about your sysctl parameters, I saw they influence the 
behaviour of multi-homed hosts when acting as routers with more than one 
network card.

My linux firewall is running as a router with two ethernet interfaces, 
but I have not seen any ARP replies sent on the wrong physical 
interface. Nevertheless, I decided to adopt your recommended values for 
these two parameters.

My problem arose because I bought a cheap smart switch and was trying to 
separate my "old" LAN ipv4 subnet into several non-overlapping subnets. 
I want to use the shorewall router to impose different restrictions on 
the traffic between each subnet, and also to the internet.

I am somewhat limited by the VLAN capabilities of the switch (netgear 
gs108ev3), and I tried several potential solutions without any working 
to my satisfaction. Currently, I am using the simplest configuration 
with port-based VLAN (rather than 802.1q).

The shorewall router has multiple ipv4 addresses on the same interface:

3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast 
state UNKNOWN group default qlen 1000
     link/ether 00:15:8a:01:91:cc brd ff:ff:ff:ff:ff:ff
     inet 10.1.253.1/24 brd 10.1.253.255 scope global eth1
        valid_lft forever preferred_lft forever
     inet 10.1.252.1/24 brd 10.1.252.255 scope global eth1
        valid_lft forever preferred_lft forever
     inet 10.1.251.1/24 brd 10.1.251.255 scope global eth1
        valid_lft forever preferred_lft forever
     inet 10.1.250.1/24 brd 10.1.250.255 scope global eth1
        valid_lft forever preferred_lft forever
     inet6 fe80::215:8aff:fe01:91cc/64 scope link
        valid_lft forever preferred_lft forever

Obviously, this interface has only one MAC address. The routing table 
quite simple:

brian@merlot:~$ ip route show
default dev ppp0  scope link
10.1.250.0/24 dev eth1  proto kernel  scope link  src 10.1.250.1
10.1.251.0/24 dev eth1  proto kernel  scope link  src 10.1.251.1
10.1.252.0/24 dev eth1  proto kernel  scope link  src 10.1.252.1
10.1.253.0/24 dev eth1  proto kernel  scope link  src 10.1.253.1
169.254.0.0/16 dev eth0  scope link  metric 1000
172.16.101.0/24 dev eth0  proto kernel  scope link  src 172.16.101.2
172.16.102.0/24 dev eth0  proto kernel  scope link  src 172.16.102.1
202.7.204.35 dev ppp0  proto kernel  scope link  src 60.241.150.231

The symptoms seen by my hosts were frustratingly hard to replicate, even 
when power-cycling the switches and starting up a link with a new 
address, routing table and empty arp table.

Originally, both the test host and the router had:
net.ipv4.conf.default.secure_redirects = 1
net.ipv4.conf.default.accept_redirects = 0
net.ipv4.conf.default.send_redirects = 0

When the test host pings a device on a subnet that is not directly 
accessible (due to the switch VLAN port mapping rules):

brian@bacchus:~$ ping -c 4 10.1.251.42
PING 10.1.251.42 (10.1.251.42) 56(84) bytes of data.
64 bytes from 10.1.251.42: icmp_seq=1 ttl=63 time=0.995 ms
 From 10.1.251.42: icmp_seq=2 Redirect Host(New nexthop: 10.1.251.42)
64 bytes from 10.1.251.42: icmp_seq=2 ttl=63 time=0.749 ms
 From 10.1.251.42: icmp_seq=3 Redirect Host(New nexthop: 10.1.251.42)
64 bytes from 10.1.251.42: icmp_seq=3 ttl=63 time=0.861 ms
 From 10.1.251.42: icmp_seq=4 Redirect Host(New nexthop: 10.1.251.42)
64 bytes from 10.1.251.42: icmp_seq=4 ttl=63 time=0.855 ms

--- 10.1.251.42 ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3002ms
rtt min/avg/max/mdev = 0.749/0.865/0.995/0.087 ms

... but the test host (bacchus) did not follow the redirect, as is shown 
by its ARP table:

brian@bacchus:~$ ip neigh show
10.1.252.200 dev eth0 lladdr 00:18:f3:43:7e:4f STALE
10.1.253.4 dev eth0 lladdr 00:30:18:ac:a9:26 STALE
10.1.252.1 dev eth0 lladdr 00:15:8a:01:91:cc REACHABLE

... as expected, both the source and target hosts were in the router's 
ARP table:

brian@merlot:~$ ip neigh show
10.1.252.200 dev eth1 lladdr 00:18:f3:43:7e:4f REACHABLE
10.1.252.23 dev eth1 lladdr c4:46:19:0a:f5:dd REACHABLE
10.1.252.41 dev eth1 lladdr 00:21:b7:d2:9d:6a STALE
10.1.252.44 dev eth1 lladdr 14:58:d0:fd:73:e7 REACHABLE
172.16.102.80 dev eth0 lladdr 54:78:1a:10:63:62 STALE
10.1.253.4 dev eth1 lladdr 00:30:18:ac:a9:26 STALE
172.16.102.2 dev eth0 lladdr 00:18:84:2c:f8:ae STALE
10.1.252.8 dev eth1 lladdr 00:30:0a:f7:b0:ba STALE
10.1.251.42 dev eth1 lladdr dc:d3:21:66:50:6a REACHABLE
10.1.252.253 dev eth1 lladdr b8:ac:6f:6a:b4:ad REACHABLE
10.1.251.20 dev eth1 lladdr 00:27:02:13:3c:5a STALE

So my problem boiled down to this question: why was my test host 
(bacchus) storing ARP table entries for hosts that were on inaccessible 
subnets? I traced all relevant traffic with wireshark on a mirrored port 
of the smart switch and was astonished to see that my router WAS sending 
redirects, AND my hosts were obeying them... the sysctl variables were 
set, but the values were not being used by the tcp/ip stacks!

My systems run variants of ubuntu. Because the variables were not being 
set at boot time, I tried calling sysctl in ip_up scripts when the 
individual interfaces were brought up (setting 
net.ipv4.conf.ethx.whatever). As before, it set the values but they were 
having no effect. I found some very old ubuntu/debian bug reports that 
complained about this behaviour, and also that the bugs have never been 
properly fixed.

My solution was to change the parameters so they were set in time and 
used properly. Instead of setting 
net.ipv4.conf.{default|eth0|eth1}.whatever in the ip_up scripts for the 
respective interfaces, I now set net.ipv4.conf.{all & default}.whatever 
in a /etc/sysctl.d/60-xxx.conf file. Not all the parameters seemed to 
work on every boot when I coded just "all" or just "default". Stable 
behaviour was only achieved by coding both in most cases.

My router settings are now:

# thanks, Lennart!
net.ipv4.conf.default.arp_announce = 2
net.ipv4.conf.default.arp_ignore = 1

# these seem to be default anyway..
net.ipv4.conf.default.shared_media = 1
net.ipv4.conf.all.shared_media = 1
net.ipv4.conf.default.secure_redirects = 1
net.ipv4.conf.all.secure_redirects = 1

# do not accept redirects from any other router
net.ipv4.conf.default.accept_redirects = 0
net.ipv4.conf.all.accept_redirects = 0
net.ipv6.conf.default.accept_redirects = 0
net.ipv6.conf.all.accept_redirects = 0

# do not send redirects to any host that is confused
# because it should already have the correct default
# gateway from DHCP or a static definition
net.ipv4.conf.default.send_redirects = 0
net.ipv4.conf.all.send_redirects = 0

My network has been stable and working perfectly for nearly a week.

Thanks again for your suggestion,

Brian

------------------------------------------------------------------------------
_______________________________________________
Shorewall-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/shorewall-users

[Shorewall-users] ARP problems with multiple ipv4 addresses on a physical interface (not aliases) [was: Packets replied to on wrong vlan]

Reply via email to