On 22/10/15 00:02, Lennart Sorensen wrote:
> On Wed, Oct 21, 2015 at 06:13:03PM +1000, Brian Burch wrote:
>> I won't confuse anyone with details (in case they are off-topic), but I
>> thought it would be helpful to let you know I've been doing problem
>> determination on what appears to be a similar issue, but with a
>> different configuration.
>>
>> Naturally, I get shorewall log events for traffic between subnets that
>> should NOT be allowed, but nothing is logged for those connections that
>> are allowed. I believe it is not a shorewall problem, but something is
>> going wrong quite low in the stack.
>>
>> The details seem to be frustratingly variable, but I often see redirect
>> log messages to/from the host sending pings. I have many wireshark
>> traces from a mirror port on my switch, but haven't yet spotted the root
>> cause.
>>
>> In my research I found a reference to Linux being built on a "weak end
>> system model" as defined in RFC1122, which apparently "leads to arp
>> problems with multi-homed hosts". I haven't fully understood the
>> theoretical issues yet, so I apologise if my comments are not relevant
>> to your situation. However, in case it is relevant I thought it best to
>> mention quickly.
>
> Do you mean the arp behaviour where it will answer an arp request for
> an address it owns on another interface? For that we use these settings
> to get sane behaviour on a router:
>
> # Do not answer ARP requests from other interfaces
> net.ipv4.conf.all.arp_ignore=1
> net.ipv4.conf.all.arp_announce=2
>
Thanks for your helpful comment, Lennart. Your suggestion wasn't the
solution, but it caused me to investigate more thoroughly.
I have solved my problem and confirmed my theory that shorewall was not
involved. It quickly became clear that my post was not relevant to the
original problem, so I thought it would be useful to document it as a
new thread with a more appropriate subject.
On reading about your sysctl parameters, I saw they influence the
behaviour of multi-homed hosts when acting as routers with more than one
network card.
My linux firewall is running as a router with two ethernet interfaces,
but I have not seen any ARP replies sent on the wrong physical
interface. Nevertheless, I decided to adopt your recommended values for
these two parameters.
My problem arose because I bought a cheap smart switch and was trying to
separate my "old" LAN ipv4 subnet into several non-overlapping subnets.
I want to use the shorewall router to impose different restrictions on
the traffic between each subnet, and also to the internet.
I am somewhat limited by the VLAN capabilities of the switch (netgear
gs108ev3), and I tried several potential solutions without any working
to my satisfaction. Currently, I am using the simplest configuration
with port-based VLAN (rather than 802.1q).
The shorewall router has multiple ipv4 addresses on the same interface:
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast
state UNKNOWN group default qlen 1000
link/ether 00:15:8a:01:91:cc brd ff:ff:ff:ff:ff:ff
inet 10.1.253.1/24 brd 10.1.253.255 scope global eth1
valid_lft forever preferred_lft forever
inet 10.1.252.1/24 brd 10.1.252.255 scope global eth1
valid_lft forever preferred_lft forever
inet 10.1.251.1/24 brd 10.1.251.255 scope global eth1
valid_lft forever preferred_lft forever
inet 10.1.250.1/24 brd 10.1.250.255 scope global eth1
valid_lft forever preferred_lft forever
inet6 fe80::215:8aff:fe01:91cc/64 scope link
valid_lft forever preferred_lft forever
Obviously, this interface has only one MAC address. The routing table
quite simple:
brian@merlot:~$ ip route show
default dev ppp0 scope link
10.1.250.0/24 dev eth1 proto kernel scope link src 10.1.250.1
10.1.251.0/24 dev eth1 proto kernel scope link src 10.1.251.1
10.1.252.0/24 dev eth1 proto kernel scope link src 10.1.252.1
10.1.253.0/24 dev eth1 proto kernel scope link src 10.1.253.1
169.254.0.0/16 dev eth0 scope link metric 1000
172.16.101.0/24 dev eth0 proto kernel scope link src 172.16.101.2
172.16.102.0/24 dev eth0 proto kernel scope link src 172.16.102.1
202.7.204.35 dev ppp0 proto kernel scope link src 60.241.150.231
The symptoms seen by my hosts were frustratingly hard to replicate, even
when power-cycling the switches and starting up a link with a new
address, routing table and empty arp table.
Originally, both the test host and the router had:
net.ipv4.conf.default.secure_redirects = 1
net.ipv4.conf.default.accept_redirects = 0
net.ipv4.conf.default.send_redirects = 0
When the test host pings a device on a subnet that is not directly
accessible (due to the switch VLAN port mapping rules):
brian@bacchus:~$ ping -c 4 10.1.251.42
PING 10.1.251.42 (10.1.251.42) 56(84) bytes of data.
64 bytes from 10.1.251.42: icmp_seq=1 ttl=63 time=0.995 ms
From 10.1.251.42: icmp_seq=2 Redirect Host(New nexthop: 10.1.251.42)
64 bytes from 10.1.251.42: icmp_seq=2 ttl=63 time=0.749 ms
From 10.1.251.42: icmp_seq=3 Redirect Host(New nexthop: 10.1.251.42)
64 bytes from 10.1.251.42: icmp_seq=3 ttl=63 time=0.861 ms
From 10.1.251.42: icmp_seq=4 Redirect Host(New nexthop: 10.1.251.42)
64 bytes from 10.1.251.42: icmp_seq=4 ttl=63 time=0.855 ms
--- 10.1.251.42 ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3002ms
rtt min/avg/max/mdev = 0.749/0.865/0.995/0.087 ms
... but the test host (bacchus) did not follow the redirect, as is shown
by its ARP table:
brian@bacchus:~$ ip neigh show
10.1.252.200 dev eth0 lladdr 00:18:f3:43:7e:4f STALE
10.1.253.4 dev eth0 lladdr 00:30:18:ac:a9:26 STALE
10.1.252.1 dev eth0 lladdr 00:15:8a:01:91:cc REACHABLE
... as expected, both the source and target hosts were in the router's
ARP table:
brian@merlot:~$ ip neigh show
10.1.252.200 dev eth1 lladdr 00:18:f3:43:7e:4f REACHABLE
10.1.252.23 dev eth1 lladdr c4:46:19:0a:f5:dd REACHABLE
10.1.252.41 dev eth1 lladdr 00:21:b7:d2:9d:6a STALE
10.1.252.44 dev eth1 lladdr 14:58:d0:fd:73:e7 REACHABLE
172.16.102.80 dev eth0 lladdr 54:78:1a:10:63:62 STALE
10.1.253.4 dev eth1 lladdr 00:30:18:ac:a9:26 STALE
172.16.102.2 dev eth0 lladdr 00:18:84:2c:f8:ae STALE
10.1.252.8 dev eth1 lladdr 00:30:0a:f7:b0:ba STALE
10.1.251.42 dev eth1 lladdr dc:d3:21:66:50:6a REACHABLE
10.1.252.253 dev eth1 lladdr b8:ac:6f:6a:b4:ad REACHABLE
10.1.251.20 dev eth1 lladdr 00:27:02:13:3c:5a STALE
So my problem boiled down to this question: why was my test host
(bacchus) storing ARP table entries for hosts that were on inaccessible
subnets? I traced all relevant traffic with wireshark on a mirrored port
of the smart switch and was astonished to see that my router WAS sending
redirects, AND my hosts were obeying them... the sysctl variables were
set, but the values were not being used by the tcp/ip stacks!
My systems run variants of ubuntu. Because the variables were not being
set at boot time, I tried calling sysctl in ip_up scripts when the
individual interfaces were brought up (setting
net.ipv4.conf.ethx.whatever). As before, it set the values but they were
having no effect. I found some very old ubuntu/debian bug reports that
complained about this behaviour, and also that the bugs have never been
properly fixed.
My solution was to change the parameters so they were set in time and
used properly. Instead of setting
net.ipv4.conf.{default|eth0|eth1}.whatever in the ip_up scripts for the
respective interfaces, I now set net.ipv4.conf.{all & default}.whatever
in a /etc/sysctl.d/60-xxx.conf file. Not all the parameters seemed to
work on every boot when I coded just "all" or just "default". Stable
behaviour was only achieved by coding both in most cases.
My router settings are now:
# thanks, Lennart!
net.ipv4.conf.default.arp_announce = 2
net.ipv4.conf.default.arp_ignore = 1
# these seem to be default anyway..
net.ipv4.conf.default.shared_media = 1
net.ipv4.conf.all.shared_media = 1
net.ipv4.conf.default.secure_redirects = 1
net.ipv4.conf.all.secure_redirects = 1
# do not accept redirects from any other router
net.ipv4.conf.default.accept_redirects = 0
net.ipv4.conf.all.accept_redirects = 0
net.ipv6.conf.default.accept_redirects = 0
net.ipv6.conf.all.accept_redirects = 0
# do not send redirects to any host that is confused
# because it should already have the correct default
# gateway from DHCP or a static definition
net.ipv4.conf.default.send_redirects = 0
net.ipv4.conf.all.send_redirects = 0
My network has been stable and working perfectly for nearly a week.
Thanks again for your suggestion,
Brian
------------------------------------------------------------------------------
_______________________________________________
Shorewall-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/shorewall-users