On 5/10/19 4:27 PM, Michael McCallister wrote:
> Hi,
> 
> Warning: I am a programmer who is occasionally given network tasks since
> "he takes care of all the computer stuff".
> 
> So - I thought I had my shorewall setup just right - until yesterday
> when I found out that for some reason some connections from Cloudflare
> are not working.  It should be noted that I tried connecting (via curl)
> from 30+ external addresses on different networks (different routes
> coming in - traversing the shorewall firewall - hitting the web server
> on the DNATed LAN - the web server replies and the reply gets sent back
> through the firewall - over WAN - back to curl) - and all 30 times it
> worked just fine as it had in my initial testing.   My first thoughts
> upon seeing this were - is this specific to these Cloudflare
> packets/connections? what is unique about them that causes the
> connection to fail, or is this perhaps not unique and just happens to by
> random chance affect Cloudflare connections (i.e. works most of the time).
> 
> So - after a bit of "man tcpdump" reading, here is what I have put
> together so far...
> 
> When a connection does not work, this is what I see on the external
> firewall interface (for privacy parts of the addresses have "DDD"):
> 
> 12:09:58.389001 IP 108.162.DDD.DDD.18556 > 108.170.DDD.DDD.https: Flags
> [S], seq 530242427, win 29200, options [mss
> 1460,nop,nop,sackOK,nop,wscale 10], length 0
> 12:09:59.419289 IP 108.162.DDD.DDD.18556 > 108.170.DDD.DDD.https: Flags
> [S], seq 530242427, win 29200, options [mss
> 1460,nop,nop,sackOK,nop,wscale 10], length 0
> 12:10:01.467223 IP 108.162.DDD.DDD.18556 > 108.170.DDD.DDD.https: Flags
> [S], seq 530242427, win 29200, options [mss
> 1460,nop,nop,sackOK,nop,wscale 10], length 0
> 12:10:05.499524 IP 108.162.DDD.DDD.18556 > 108.170.DDD.DDD.https: Flags
> [S], seq 530242427, win 29200, options [mss
> 1460,nop,nop,sackOK,nop,wscale 10], length 0
> 
> and on the internal firewall interface, I see this:
> 
> 12:09:58.389079 IP 108.162.DDD.DDD.18556 > 10.0.21.10.https: Flags [S],
> seq 530242427, win 29200, options [mss 1460,nop,nop,sackOK,nop,wscale
> 10], length 0
> 12:09:58.389248 IP 10.0.21.10.https > 108.162.DDD.DDD.18556: Flags [S.],
> seq 3740567620, ack 530242428, win 29200, options [mss
> 1460,nop,nop,sackOK,nop,wscale 9], length 0
> 12:09:59.419342 IP 108.162.DDD.DDD.18556 > 10.0.21.10.https: Flags [S],
> seq 530242427, win 29200, options [mss 1460,nop,nop,sackOK,nop,wscale
> 10], length 0
> 12:09:59.419493 IP 10.0.21.10.https > 108.162.DDD.DDD.18556: Flags [S.],
> seq 3740567620, ack 530242428, win 29200, options [mss
> 1460,nop,nop,sackOK,nop,wscale 9], length 0
> 12:10:00.591520 IP 10.0.21.10.https > 108.162.DDD.DDD.18556: Flags [S.],
> seq 3740567620, ack 530242428, win 29200, options [mss
> 1460,nop,nop,sackOK,nop,wscale 9], length 0
> 12:10:01.467274 IP 108.162.DDD.DDD.18556 > 10.0.21.10.https: Flags [S],
> seq 530242427, win 29200, options [mss 1460,nop,nop,sackOK,nop,wscale
> 10], length 0
> 12:10:01.467418 IP 10.0.21.10.https > 108.162.DDD.DDD.18556: Flags [S.],
> seq 3740567620, ack 530242428, win 29200, options [mss
> 1460,nop,nop,sackOK,nop,wscale 9], length 0
> 12:10:03.591510 IP 10.0.21.10.https > 108.162.DDD.DDD.18556: Flags [S.],
> seq 3740567620, ack 530242428, win 29200, options [mss
> 1460,nop,nop,sackOK,nop,wscale 9], length 0
> 12:10:05.499574 IP 108.162.DDD.DDD.18556 > 10.0.21.10.https: Flags [S],
> seq 530242427, win 29200, options [mss 1460,nop,nop,sackOK,nop,wscale
> 10], length 0
> 12:10:05.499698 IP 10.0.21.10.https > 108.162.DDD.DDD.18556: Flags [S.],
> seq 3740567620, ack 530242428, win 29200, options [mss
> 1460,nop,nop,sackOK,nop,wscale 9], length 0
> 
> So this seems to indicate that externally - Cloudflare is sending a syn
> - and just keeps sending it because it never gets the syn ack. 
> Internally, the web server is attempting to send a syn ack back and that
> hits the firewall's internal interface - but it never makes it to the
> external interface - and hence no connection.
> 
> In "shorewall dump", if I search for 108.162.DDD.DDD it shows under
> "ARP" - 108.162.DDD.DDD dev enp9s0f0  FAILED (enp9s0f0  is the external
> interface that web traffic comes in on).  I think this may be involved
> with the problem... see below about shorewall setup for more on this.
> 
> So I figured this might be as simple as what is different between when
> it works and when it does not...  so here is what I see on a connection
> from Cloudflare that works (external firewall interface):
> 
> 12:29:05.318777 IP 172.68.DDD.DDD.53032 > 108.170.DDD.DDD.https: Flags
> [S], seq 1448564581, win 29200, options [mss
> 1460,nop,nop,sackOK,nop,wscale 10], length 0
> 12:29:05.319088 IP 108.170.DDD.DDD.https > 172.68.DDD.DDD.53032: Flags
> [S.], seq 1881210743, ack 1448564582, win 29200, options [mss
> 1460,nop,nop,sackOK,nop,wscale 9], length 0
> 12:29:05.329789 IP 172.68.DDD.DDD.53032 > 108.170.DDD.DDD.https: Flags
> [.], ack 1, win 29, length 0
> 12:29:05.330191 IP 172.68.DDD.DDD.53032 > 108.170.DDD.DDD.https: Flags
> [P.], seq 1:250, ack 1, win 29, length 249
> 12:29:05.330371 IP 108.170.DDD.DDD.https > 172.68.DDD.DDD.53032: Flags
> [.], ack 250, win 60, length 0
> 12:29:05.345108 IP 108.170.DDD.DDD.https > 172.68.DDD.DDD.53032: Flags
> [P.], seq 1:2717, ack 250, win 60, length 2716
> 12:29:05.355884 IP 172.68.DDD.DDD.53032 > 108.170.DDD.DDD.https: Flags
> [.], ack 2717, win 34, length 0
> 12:29:05.361989 IP 172.68.DDD.DDD.53032 > 108.170.DDD.DDD.https: Flags
> [P.], seq 250:408, ack 2717, win 34, length 158
> 12:29:05.366403 IP 108.170.DDD.DDD.https > 172.68.DDD.DDD.53032: Flags
> [P.], seq 2717:2768, ack 408, win 62, length 51
> 12:29:05.377522 IP 172.68.DDD.DDD.53032 > 108.170.DDD.DDD.https: Flags
> [P.], seq 408:753, ack 2768, win 34, length 345
> 12:29:05.404676 IP 108.170.DDD.DDD.https > 172.68.DDD.DDD.53032: Flags
> [P.], seq 2768:3252, ack 753, win 64, length 484
> 12:29:05.455941 IP 172.68.DDD.DDD.53032 > 108.170.DDD.DDD.https: Flags
> [.], ack 3252, win 37, length 0
> 
> And internal firewall interface:
> 
> 12:29:05.318855 IP 172.68.DDD.DDD.53032 > 10.0.21.10.https: Flags [S],
> seq 1448564581, win 29200, options [mss 1460,nop,nop,sackOK,nop,wscale
> 10], length 0
> 12:29:05.319031 IP 10.0.21.10.https > 172.68.DDD.DDD.53032: Flags [S.],
> seq 1881210743, ack 1448564582, win 29200, options [mss
> 1460,nop,nop,sackOK,nop,wscale 9], length 0
> 12:29:05.329842 IP 172.68.DDD.DDD.53032 > 10.0.21.10.https: Flags [.],
> ack 1, win 29, length 0
> 12:29:05.330240 IP 172.68.DDD.DDD.53032 > 10.0.21.10.https: Flags [P.],
> seq 1:250, ack 1, win 29, length 249
> 12:29:05.330336 IP 10.0.21.10.https > 172.68.DDD.DDD.53032: Flags [.],
> ack 250, win 60, length 0
> 12:29:05.345052 IP 10.0.21.10.https > 172.68.DDD.DDD.53032: Flags [P.],
> seq 1:2717, ack 250, win 60, length 2716
> 12:29:05.355936 IP 172.68.DDD.DDD.53032 > 10.0.21.10.https: Flags [.],
> ack 2717, win 34, length 0
> 12:29:05.362042 IP 172.68.DDD.DDD.53032 > 10.0.21.10.https: Flags [P.],
> seq 250:408, ack 2717, win 34, length 158
> 12:29:05.366356 IP 10.0.21.10.https > 172.68.DDD.DDD.53032: Flags [P.],
> seq 2717:2768, ack 408, win 62, length 51
> 12:29:05.377574 IP 172.68.DDD.DDD.53032 > 10.0.21.10.https: Flags [P.],
> seq 408:753, ack 2768, win 34, length 345
> 12:29:05.404624 IP 10.0.21.10.https > 172.68.DDD.DDD.53032: Flags [P.],
> seq 2768:3252, ack 753, win 64, length 484
> 12:29:05.455994 IP 172.68.DDD.DDD.53032 > 10.0.21.10.https: Flags [.],
> ack 3252, win 37, length 0
> 
> But I see no major differences in the critical first packets (except
> that the Cloudflare connection address is different) - but maybe there
> is?  The initial reply ack packets all have "win 29200, options [mss
> 1460,nop,nop,sackOK,nop,wscale 9], length 0" (whatever that stuff means
> - at least it seems to be the same)
> 
> So I tried enabling/disabling options in shorewall/interfaces - disabled
> everything in shorewall/mangle - you know - basic stuff to see if there
> was some setting that would magically fix things - no such luck.
> 
> So, here is a brief summary of the shorewall setup - its a multi-isp
> setup (sorry - had to be complicated) that also uses "Simple" traffic
> shaping - and in all my tests (where I control/initiate the connection)
> it has worked perfectly.
> 
> shorewall-5.2.3.2-1.el7
> 
> 3.10.0-957.12.1.el7.x86_64
> 
> Inbound web traffic comes in on zone: wan1 interface: enp9s0f0
> 
> shorewall/interfaces:
> 
>> ?FORMAT 2
>> ###############################################################################
>>
>> #ZONE       INTERFACE       OPTIONS
>>
>> # wan1 comes in via pnap's ethernet drop
>> # (default outbound)
>> wan1        enp9s0f1
>> nosmurfs,rpfilter,sourceroute=0,routefilter=0,tcpflags
>>
>> #  (not published in dns - not used for outbound)
>> wan1        enp9s0f0
>> nosmurfs,rpfilter,sourceroute=0,routefilter=0,tcpflags
>>
>> # wan2 comes in via pnap's public dc wifi (used for out-of-band and
>> bulk/backups)
>> # 2.4 wifi
>> wan2        eno5 nosmurfs,rpfilter,sourceroute=0,routefilter=0,tcpflags
>> # 5.0 wifi
>> wan2        eno6 nosmurfs,rpfilter,sourceroute=0,routefilter=0,tcpflags
>>
>> # lan2 is just for device management
>> # border switch management (physically same switch where wan connects
>> - just in vlan)
>> lan2        eno4            routeback,routefilter
>>
>> # ipmi and other management (internal switch but on its own vlan to
>> separate)
>> lan2        eno3            routeback,routefilter,dhcp
>>
>> # internal/private switch
>> lan1        eno1            routeback,routefilter,dhcp
> 
> shorewall/tcinterfaces
> 
>> #INTERFACE  TYPE IN_BANDWIDTH OUT_BANDWIDTH
>>
>> enp9s0f1    external    - 100mbit:100kb
>> enp9s0f0    external    - 100mbit:100kb
>> eno5        external    - 10mbit:100kb
>> eno6        external    - 100mbit:100kb
>>
>>
>> eno1        internal    - 100mbit:100kb 
> 
> It is probably worth mentioning that enp9s0f1 enp9s0f0 have different
> addresses / gateways - but plug into the same switch and share that
> switch (all in the same broadcast domain) with the ethernet drop from
> the data center.  So this ethernet drop delivers two address ranges -
> each with its own gateway (two gateway IPs) - but I assume both gateways
> would be the same MAC.
> 
> shorewall/snat
> 
>> SNAT(184.164.DDD.DDD) 0.0.0.0/0 enp9s0f1
>> SNAT(108.170.DDD.DDD)  0.0.0.0/0       enp9s0f0
>> SNAT(192.168.101.2) 0.0.0.0/0       eno6
>> SNAT(192.168.102.2) 0.0.0.0/0       eno5
> 
> shorewall/providers
> 
>> #NAME   NUMBER  MARK    DUPLICATE INTERFACE   GATEWAY     OPTIONS    
>> COPY
>>
>> pnap1   1   1   -       enp9s0f1    184.164.DDD.1 balance,track   -
>> pnap2   2   2   -       enp9s0f0    108.170.DDD.57 balance,track   -
>> pnap3   3   3   -       eno5        192.168.102.1 balance,track   -
>> pnap4   4   4   -       eno6        192.168.101.1 balance,track   - 
> 
> The web rules are defined using the "Web" shorewall macro: Web(DNAT)
> 
> Here is a complete shorewall dump:
> https://drive.google.com/file/d/1tYkHY7EyzzfLINqP4bf6LlW5hbyFEvSl/view?usp=sharing
> 
> 
> So - I think this has to be related to the "FAILED" entry in the
> shorewall dump ARP section.  That means that the external interface does
> not know the MAC address for 108.162.DDD.DDD traffic?  I tried adding
> the MAC address of the gateway to shorewall/providers i.e.
> 184.164.DDD.1,74:8e:f8:92:f3:60 and 108.170.DDD.57,74:8e:f8:92:f3:60 -
> but that made no difference. Is there some way to tell it - "hey send
> everything not assigned here to this MAC"?
> 
> Sorry for the long email - I tried to be descriptive.  I feel like I am
> real close to figuring it out... but I felt that way this morning too -
> so... figured I would run it my the list in case I am doing something
> that is obviously wrong.
> 
> As always - any help is GREATLY appreciated.
> 
>

The cause of your problem is this wildly wrong route you have in your
main table:

108.0.0.0/8 dev enp9s0f0 proto kernel scope link src 108.170.3.58

With that route, your system believes that all IPv4 addresses whose
first byte contains 108 is on the same LAN as enp9s0f0. So attempting to
connect from any host in that network (except your upstream router) will
fail. That route should be:

108.170.DDD.58/8 dev enp9s0f0 proto kernel scope link src 108.170.DDD.58

-Tom
-- 
Tom Eastep        \   Q: What do you get when you cross a mobster with
Shoreline,         \     an international standard?
Washington, USA     \ A: Someone who makes you an offer you can't
http://shorewall.org \   understand
                      \_______________________________________________

Attachment: signature.asc
Description: OpenPGP digital signature

_______________________________________________
Shorewall-users mailing list
Shorewall-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/shorewall-users

Reply via email to