On Sat, Apr 30, 2022 at 09:53:55PM +0100, Ian Chilton wrote:
> Hi Alexandr,
> 
>     If I understand your set up right, then gw1 is supposed to forward icmp
> >     packets between gw2 and ripe. If it is the case then I would expect
> >     there will be two state entries for ICMP packet:
> >
> 
> Not quite... pings from gw2 -> ripe, goes out of a connected transit
> interface. Reply comes back to gw1, which should be forwarded to gw2.
> However, it's dropped instead of forwarded.

You have asymetric routing and with that stateful firewall rules will
cause you problems. In your case gw1 will block the ICMP reply because it
never encountered the ICMP request matching that reply.

On most of my BGP routers I have either pf disabled or I write the ruleset
so that only local traffic is stateful but all forwarded traffic uses a
no-state rule. IIRC even sloppy-state tracking will block some traffic
that's why I avoid that option.

pf(4) is way more strict when it comes to stateful filtering than most
other firewall products I know.
 
> root@gw1:~# tcpdump -i vlan313 host 193.0.6.138
> tcpdump: listening on vlan313, link-type EN10MB
> 21:42:17.742211 rpki.ripe.net > 172.16.0.91: icmp: echo reply (DF)
> 21:42:18.742022 rpki.ripe.net > 172.16.0.91: icmp: echo reply (DF)
> 21:42:19.742087 rpki.ripe.net > 172.16.0.91: icmp: echo reply (DF)
> 21:42:20.742005 rpki.ripe.net > 172.16.0.91: icmp: echo reply (DF)
> ^C
> 429 packets received by filter
> 0 packets dropped by kernel
> 
> root@gw1:~# route -n get 172.16.0.91
>    route to: 172.16.0.91
> destination: 172.16.0.91
>        mask: 255.255.255.255
>     gateway: 172.16.0.67
>   interface: vlan209
>  if address: 172.16.0.66
>    priority: 32 (ospf)
>       flags: <UP,GATEWAY,DONE,MPATH>
>      use       mtu    expire
>   208537         0         0
> 
> root@gw1:~# tcpdump -i vlan209 host 193.0.6.138
> tcpdump: listening on vlan209, link-type EN10MB
> ^C
> 748 packets received by filter
> 0 packets dropped by kernel
> 
> root@gw1:~# pfctl -ss |grep 193.0.6.138
> all icmp 72.16.0.91:3235 -> 193.0.6.138:8       0:0
> 
> root@the-gw1:~# ping 172.16.0.91
> PING 172.16.0.91 (172.16.0.91): 56 data bytes
> 64 bytes from 172.16.0.91: icmp_seq=0 ttl=255 time=0.380 ms
> 64 bytes from 172.16.0.91: icmp_seq=1 ttl=255 time=0.254 ms
> 
> 
>         the first 'inbound' state created by inbound packet
> >         the second 'outbound' state created as ICMP request leaves the
> > host.
> >
> >     the 'pfctl -ss' just shows single state here, while there should be two
> >     in fact. But it still does not explain why
> >         pass quick proto { icmp, icmp6 }
> >     fails to create a missing state. if matching state can not be found
> >     four outbound icmp reply on gw1, then the rule above should kick in.
> >
> 
> Interesting...
> 
> If I ping directly from gw1, it works and I only get one state entry:
> 
> root@gw1:~# pfctl -ss |grep 193.0.6.138
> all icmp 172.16.0.90:31109 -> 193.0.6.138:8       0:0
> 
> ichilton@gw1:~$ ping -I 172.16.0.90 rpki.ripe.net
> PING rpki.ripe.net (193.0.6.138): 56 data bytes
> 64 bytes from 193.0.6.138: icmp_seq=0 ttl=252 time=7.726 ms
> 64 bytes from 193.0.6.138: icmp_seq=1 ttl=252 time=7.452 ms
> 64 bytes from 193.0.6.138: icmp_seq=2 ttl=252 time=7.602 ms
> 64 bytes from 193.0.6.138: icmp_seq=3 ttl=252 time=7.499 ms
> 
> 
> 
> >     I wonder how 'pfctl -sI' output looks like. It should report all
> >     network interfaces recognized by pf(4).
> >
> 
> It lists all of the interfaces and groups - long list.
> 
> 
> 
> >     I'm not sure how much busy gw1 is, but it might make some sense to
> >     repeat the test with 'pfctl -x debug', this will make pf(4) more
> >     talkative we might be lucky to get some hits to see what goes wrong.
> 
> 
> It spews a *lot*, but if I grep for that RIPE IP, I get:
> 
> root@the-gw1:~# cat /var/log/messages|grep 193.0.6.138
> Apr 30 21:30:53 gw1 /bsd: pf: key search, in on vlan313: ICMP wire: (0)
> 193.0.6.138:8 172.16.0.91:3235
> Apr 30 21:30:53 gw1 /bsd: pf: key search, out on vlan209: ICMP wire: (0)
> 172.16.0.91:3235 193.0.6.138:8
> Apr 30 21:30:54 gw1 /bsd: pf: key search, in on vlan313: ICMP wire: (0)
> 193.0.6.138:8 172.16.0.91:3235
> Apr 30 21:30:54 gw1 /bsd: pf: key search, out on vlan209: ICMP wire: (0)
> 172.16.0.91:3235 193.0.6.138:8
> Apr 30 21:30:55 gw1 /bsd: pf: key search, in on vlan313: ICMP wire: (0)
> 193.0.6.138:8 172.16.0.91:3235
> Apr 30 21:30:55 gw1 /bsd: pf: key search, out on vlan209: ICMP wire: (0)
> 172.16.0.91:3235 193.0.6.138:8
> 
> This looks confusing - so it looks like it's sending out of vlan209? - but
> the source and dest are reversed???

There is some magic in the lookups that reverse the lookup for outgoing
packets so that in and out share a state. Also the state lookup uses wire
and stack side keys to track NAT and RDR states.

As mentioned above you can not use stateful filtering if you can not
ensure that all traffic passes a firewall in both directions.

-- 
:wq Claudio

Reply via email to