Re: PROBLEM: TPROXY and DNAT broken (bisected to 079096f103fa)

2016-09-06 Thread Florian Westphal
Brandon Cazander  wrote:

[ cc netfilter-devel ]

> Sorry to resurrect this so much later—I just got back from holidays and this 
> was still on my desk.
> 
> Will anyone have another chance to look at this? It appears that the DIVERT 
> rule is not working in our case, and I wonder if it is possible to fix the 
> TPROXY target as well as the socket target fix that Florian provided.

Are there reproducer instructions available for this?

I don't see how TPROXY can be 'fixed' because when skb (tcp syn) is in
mangle PREROUTING nat transformation(s) have not been set up (yet).

So ip header addresses are all we have.

Only the ack (that finishes 3whs) or retransmitted syns will
have the post-nat address info available.

Ack should work fine with (changed) -m socket since the
socket should already be in the main ehash table.

Syn should also work just fine because Erics changes
should not affect initial listener lookup done by TPROXY.

> It appears as though nobody else has encountered this regression, so I can 
> appreciate that it comes up pretty low on the priority list. If it is not 
> realistic that this will be looked at further, then we will have to look at 
> replacing TPROXY.

If you already need NAT anyway you can also use -j REDIRECT (or exclude
tproxied packets from nat).


Re: PROBLEM: TPROXY and DNAT broken (bisected to 079096f103fa)

2016-09-06 Thread Brandon Cazander
Sorry to resurrect this so much later—I just got back from holidays and this 
was still on my desk.

Will anyone have another chance to look at this? It appears that the DIVERT 
rule is not working in our case, and I wonder if it is possible to fix the 
TPROXY target as well as the socket target fix that Florian provided.

It appears as though nobody else has encountered this regression, so I can 
appreciate that it comes up pretty low on the priority list. If it is not 
realistic that this will be looked at further, then we will have to look at 
replacing TPROXY.

Thanks for your time.


From: Brandon Cazander
Sent: Monday, August 15, 2016 9:28 AM
To: Florian Westphal
Cc: netdev@vger.kernel.org; Eric Dumazet
Subject: Re: PROBLEM: TPROXY and DNAT broken (bisected to 079096f103fa)
    
I can recreate the issue with these rules:

ip rule add fwmark 1 lookup 100
ip route add local 0.0.0.0/0 dev lo table 100
iptables -t mangle -A PREROUTING -p tcp -m tcp --dport 8080 -j TPROXY --on-port 
9876 --on-ip 0.0.0.0 --tproxy-mark 0x1/0x1
iptables -t nat -A PREROUTING -d 192.168.7.20/32 -i eth0 -j DNAT 
--to-destination 192.168.8.1

If I add in the DIVERT chain it works:

iptables -t mangle -N DIVERT
iptables -t mangle -A PREROUTING -p tcp -m socket -j DIVERT
iptables -t mangle -A DIVERT -j MARK --set-mark 1
iptables -t mangle -A DIVERT -j ACCEPT
iptables -t mangle -A PREROUTING -p tcp -m tcp --dport 8080 -j TPROXY --on-port 
9876 --on-ip 0.0.0.0 --tproxy-mark 0x1/0x1

But that's still a regression in my opinion.

Re: PROBLEM: TPROXY and DNAT broken (bisected to 079096f103fa)

2016-08-15 Thread Brandon Cazander
I can recreate the issue with these rules:

ip rule add fwmark 1 lookup 100
ip route add local 0.0.0.0/0 dev lo table 100
iptables -t mangle -A PREROUTING -p tcp -m tcp --dport 8080 -j TPROXY --on-port 
9876 --on-ip 0.0.0.0 --tproxy-mark 0x1/0x1
iptables -t nat -A PREROUTING -d 192.168.7.20/32 -i eth0 -j DNAT 
--to-destination 192.168.8.1

If I add in the DIVERT chain it works:

iptables -t mangle -N DIVERT
iptables -t mangle -A PREROUTING -p tcp -m socket -j DIVERT
iptables -t mangle -A DIVERT -j MARK --set-mark 1
iptables -t mangle -A DIVERT -j ACCEPT
iptables -t mangle -A PREROUTING -p tcp -m tcp --dport 8080 -j TPROXY --on-port 
9876 --on-ip 0.0.0.0 --tproxy-mark 0x1/0x1

But that's still a regression in my opinion.


Re: PROBLEM: TPROXY and DNAT broken (bisected to 079096f103fa)

2016-08-12 Thread Florian Westphal
Brandon Cazander  wrote:
> Is there anything I can provide or do to help get this issue fixed? Even with 
> the patch provided, our application is still broken.

[..]

> I think that it is worth doing, as the original kernel change broke my user 
> space program and could do the same to others as well.
> 
> On another setup, even with the DIVERT rule in place, I'm still seeing the 
> RST after the ACK. I'm not sure how it is behaving differently than the other 
> setup so I need to look into that. But it definitely worked before the 
> changes to the kernel.

Well, what is different in that setup?
(e.g. rules, application binding different ip address, etc)

Once I can re-create the problem chances for a fix will be a bit
higher...


Re: PROBLEM: TPROXY and DNAT broken (bisected to 079096f103fa)

2016-08-12 Thread Brandon Cazander
Is there anything I can provide or do to help get this issue fixed? Even with 
the patch provided, our application is still broken.

-Brandon

From: Brandon Cazander
Sent: Wednesday, August 3, 2016 8:47 AM
To: Florian Westphal
Cc: netdev@vger.kernel.org
Subject: Re: PROBLEM: TPROXY and DNAT broken (bisected to 079096f103fa)
    
I think that it is worth doing, as the original kernel change broke my user 
space program and could do the same to others as well.

On another setup, even with the DIVERT rule in place, I'm still seeing the RST 
after the ACK. I'm not sure how it is behaving differently than the other setup 
so I need to look into that. But it definitely worked before the changes to the 
kernel.

From: Florian Westphal <f...@strlen.de>
Sent: Tuesday, August 2, 2016 3:11 PM
To: Brandon Cazander
Cc: Florian Westphal
Subject: Re: PROBLEM: TPROXY and DNAT broken (bisected to 079096f103fa)
    
Brandon Cazander <brandon.cazan...@multapplied.net> wrote:
> > Please try this patch, it makes it work for me again.
> >   I decided to extend the existing snat support in xt_socket.c instead
> >   of changing TPROXY target:
> 
> This fixes my example (with the DIVERT chain), but does not fix the two-line 
> example you gave below. Another setup I have is also still broken as of this 
> diff (similarly, there is a rule in nat PREROUTING that goes to a chain with 
> the TPROXY rule).

Yes, I did not touch TPROXY target, we would need something similar
(take tuple addresses from the conntrack entry) there as well if we
need to make it work without the -m socket rule.

    


Re: PROBLEM: TPROXY and DNAT broken (bisected to 079096f103fa)

2016-08-03 Thread Brandon Cazander
I think that it is worth doing, as the original kernel change broke my user 
space program and could do the same to others as well.

On another setup, even with the DIVERT rule in place, I'm still seeing the RST 
after the ACK. I'm not sure how it is behaving differently than the other setup 
so I need to look into that. But it definitely worked before the changes to the 
kernel.

From: Florian Westphal <f...@strlen.de>
Sent: Tuesday, August 2, 2016 3:11 PM
To: Brandon Cazander
Cc: Florian Westphal
Subject: Re: PROBLEM: TPROXY and DNAT broken (bisected to 079096f103fa)
    
Brandon Cazander <brandon.cazan...@multapplied.net> wrote:
> > Please try this patch, it makes it work for me again.
> >   I decided to extend the existing snat support in xt_socket.c instead
> >   of changing TPROXY target:
> 
> This fixes my example (with the DIVERT chain), but does not fix the two-line 
> example you gave below. Another setup I have is also still broken as of this 
> diff (similarly, there is a rule in nat PREROUTING that goes to a chain with 
> the TPROXY rule).

Yes, I did not touch TPROXY target, we would need something similar
(take tuple addresses from the conntrack entry) there as well if we
need to make it work without the -m socket rule.




Re: PROBLEM: TPROXY and DNAT broken (bisected to 079096f103fa)

2016-08-02 Thread Brandon Cazander
> Please try this patch, it makes it work for me again.
>   I decided to extend the existing snat support in xt_socket.c instead
>   of changing TPROXY target:

This fixes my example (with the DIVERT chain), but does not fix the two-line 
example you gave below. Another setup I have is also still broken as of this 
diff (similarly, there is a rule in nat PREROUTING that goes to a chain with 
the TPROXY rule).

> No need, this reproduces easily with this two-line ruleset:
> 
> -t nat -A PREROUTING -d 192.168.7.20/32 -i eth0 -j DNAT --to-destination 
> 192.168.8.1
> -t mangle -A PREROUTING -p tcp -m tcp --dport 8080 -j TPROXY --on-port 
> 9876 --on-ip 0.0.0.0 --tproxy-mark 0x1/0x1

As I said above, this doesn't work, but this does:

-t nat -A PREROUTING -d 192.168.7.20/32 -i eth0 -j DNAT --to-destination 
192.168.8.1
-t mangle -N DIVERT
-t mangle -A PREROUTING -p tcp -m socket -j DIVERT
-t mangle -A DIVERT -j MARK --set-mark 1
-t mangle -A DIVERT -j ACCEPT
-t mangle -A PREROUTING -p tcp -m tcp --dport 8080 -j TPROXY --on-port 9876 
--on-ip 0.0.0.0 --tproxy-mark 0x1/0x1

Thanks for looking at this so quickly.



From: Florian Westphal <f...@strlen.de>
Sent: Friday, July 29, 2016 6:21 AM
To: Brandon Cazander
Cc: netdev@vger.kernel.org; eduma...@google.com
Subject: Re: PROBLEM: TPROXY and DNAT broken (bisected to 079096f103fa)
    
Brandon Cazander <brandon.cazan...@multapplied.net> wrote:
> * When it fails, no traffic hits the WEBSERVER. A tcpdump on the bad kernel 
> shows:
> root@dons-qemu-new-kernel:~# tcpdump -niany tcp and port 8080
> tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
> listening on any, link-type LINUX_SLL (Linux cooked), capture size 65535 
>bytes
> 16:42:31.551952 IP 10.100.0.206.35562 > 42.0.1.1.8080: Flags [S], seq 
>3793582216, win 29200, options [mss 1460,sackOK,TS val 632068656 ecr 
>0,nop,wscale 7], length 0
> 16:42:31.551988 IP 42.0.1.1.8080 > 10.100.0.206.35562: Flags [S.], seq 
>4042636216, ack 3793582217, win 28960, options [mss 1460,sackOK,TS val 745382 
>ecr 632068656,nop,wscale 7], length 0
> 16:42:31.55 IP 10.100.0.206.35562 > 42.0.1.1.8080: Flags [.], ack 1, 
>win 229, options [nop,nop,TS val 632068657 ecr 745382], length 0
> 16:42:31.552238 IP 42.0.1.1.8080 > 10.100.0.206.35562: Flags [R], seq 
>4042636217, win 0, length 0
> 16:42:31.552246 IP 10.100.0.206.35562 > 42.0.1.1.8080: Flags [P.], seq 
>1:78, ack 1, win 229, options [nop,nop,TS val 632068657 ecr 745382], length 77
> 16:42:31.552251 IP 42.0.1.1.8080 > 10.100.0.206.35562: Flags [R], seq 
>4042636217, win 0, length 0
> 16:42:32.551668 IP 42.0.1.1.8080 > 10.100.0.206.35562: Flags [S.], seq 
>4042636216, ack 3793582217, win 28960, options [mss 1460,sackOK,TS val 745632 
>ecr 632068656,nop,wscale 7], length 0
> 16:42:32.551925 IP 10.100.0.206.35562 > 42.0.1.1.8080: Flags [R], seq 
>3793582217, win 0, length 0
> 16:42:34.551668 IP 42.0.1.1.8080 > 10.100.0.206.35562: Flags [S.], seq 
>4042636216, ack 3793582217, win 28960, options [mss 1460,sackOK,TS val 746132 
>ecr 632068656,nop,wscale 7], length 0
> 16:42:34.551995 IP 10.100.0.206.35562 > 42.0.1.1.8080: Flags [R], seq 
>3793582217, win 0, length 0

Please try this patch, it makes it work for me again.
I decided to extend the existing snat support in xt_socket.c instead
of changing TPROXY target:

diff --git a/net/netfilter/xt_socket.c b/net/netfilter/xt_socket.c
--- a/net/netfilter/xt_socket.c
+++ b/net/netfilter/xt_socket.c
@@ -144,6 +144,44 @@ static bool xt_socket_sk_is_transparent(struct sock *sk)
 }
 }
 
+static void get_lookup_daddr(const struct sk_buff *skb, u32 *daddr, u16 *dport)
+{
+#ifdef XT_SOCKET_HAVE_CONNTRACK
+   const struct iphdr *iph = ip_hdr(skb);
+   enum ip_conntrack_info ctinfo;
+   enum ip_conntrack_dir dir;
+   struct nf_conn const *ct;
+
+   /* Do the lookup with the original socket address in
+    * case this is a packet of an SNAT-ted connection.
+    */
+   ct = nf_ct_get(skb, );
+   if (!ct || nf_ct_is_untracked(ct))
+   return;
+
+   if ((ct->status & IPS_SRC_NAT_DONE) == 0)
+   return;
+
+   dir = CTINFO2DIR(ctinfo);
+   switch (iph->protocol) {
+   case IPPROTO_ICMP:
+   if (ctinfo != IP_CT_RELATED_REPLY)
+   return;
+   break;
+   case IPPROTO_TCP:
+   *dport = ct->tuplehash[!dir].tuple.src.u.tcp.port;
+   break;
+   case IPPROTO_UDP:
+   *dport = ct->tuplehash[!dir].tuple.src.u.udp.port;
+   break;
+   default:
+   return;
+   }
+
+   *daddr = ct->tuplehash[!dir].tuple.src.u3.ip;
+#endif
+}
+
 static struct sock *xt_socket_lookup_slow_v4(struct 

Re: PROBLEM: TPROXY and DNAT broken (bisected to 079096f103fa)

2016-07-29 Thread Florian Westphal
Brandon Cazander  wrote:
> * When it fails, no traffic hits the WEBSERVER. A tcpdump on the bad kernel 
> shows:
> root@dons-qemu-new-kernel:~# tcpdump -niany tcp and port 8080
> tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
> listening on any, link-type LINUX_SLL (Linux cooked), capture size 65535 
> bytes
> 16:42:31.551952 IP 10.100.0.206.35562 > 42.0.1.1.8080: Flags [S], seq 
> 3793582216, win 29200, options [mss 1460,sackOK,TS val 632068656 ecr 
> 0,nop,wscale 7], length 0
> 16:42:31.551988 IP 42.0.1.1.8080 > 10.100.0.206.35562: Flags [S.], seq 
> 4042636216, ack 3793582217, win 28960, options [mss 1460,sackOK,TS val 745382 
> ecr 632068656,nop,wscale 7], length 0
> 16:42:31.55 IP 10.100.0.206.35562 > 42.0.1.1.8080: Flags [.], ack 1, 
> win 229, options [nop,nop,TS val 632068657 ecr 745382], length 0
> 16:42:31.552238 IP 42.0.1.1.8080 > 10.100.0.206.35562: Flags [R], seq 
> 4042636217, win 0, length 0
> 16:42:31.552246 IP 10.100.0.206.35562 > 42.0.1.1.8080: Flags [P.], seq 
> 1:78, ack 1, win 229, options [nop,nop,TS val 632068657 ecr 745382], length 77
> 16:42:31.552251 IP 42.0.1.1.8080 > 10.100.0.206.35562: Flags [R], seq 
> 4042636217, win 0, length 0
> 16:42:32.551668 IP 42.0.1.1.8080 > 10.100.0.206.35562: Flags [S.], seq 
> 4042636216, ack 3793582217, win 28960, options [mss 1460,sackOK,TS val 745632 
> ecr 632068656,nop,wscale 7], length 0
> 16:42:32.551925 IP 10.100.0.206.35562 > 42.0.1.1.8080: Flags [R], seq 
> 3793582217, win 0, length 0
> 16:42:34.551668 IP 42.0.1.1.8080 > 10.100.0.206.35562: Flags [S.], seq 
> 4042636216, ack 3793582217, win 28960, options [mss 1460,sackOK,TS val 746132 
> ecr 632068656,nop,wscale 7], length 0
> 16:42:34.551995 IP 10.100.0.206.35562 > 42.0.1.1.8080: Flags [R], seq 
> 3793582217, win 0, length 0

Please try this patch, it makes it work for me again.
I decided to extend the existing snat support in xt_socket.c instead
of changing TPROXY target:

diff --git a/net/netfilter/xt_socket.c b/net/netfilter/xt_socket.c
--- a/net/netfilter/xt_socket.c
+++ b/net/netfilter/xt_socket.c
@@ -144,6 +144,44 @@ static bool xt_socket_sk_is_transparent(struct sock *sk)
}
 }
 
+static void get_lookup_daddr(const struct sk_buff *skb, u32 *daddr, u16 *dport)
+{
+#ifdef XT_SOCKET_HAVE_CONNTRACK
+   const struct iphdr *iph = ip_hdr(skb);
+   enum ip_conntrack_info ctinfo;
+   enum ip_conntrack_dir dir;
+   struct nf_conn const *ct;
+
+   /* Do the lookup with the original socket address in
+* case this is a packet of an SNAT-ted connection.
+*/
+   ct = nf_ct_get(skb, );
+   if (!ct || nf_ct_is_untracked(ct))
+   return;
+
+   if ((ct->status & IPS_SRC_NAT_DONE) == 0)
+   return;
+
+   dir = CTINFO2DIR(ctinfo);
+   switch (iph->protocol) {
+   case IPPROTO_ICMP:
+   if (ctinfo != IP_CT_RELATED_REPLY)
+   return;
+   break;
+   case IPPROTO_TCP:
+   *dport = ct->tuplehash[!dir].tuple.src.u.tcp.port;
+   break;
+   case IPPROTO_UDP:
+   *dport = ct->tuplehash[!dir].tuple.src.u.udp.port;
+   break;
+   default:
+   return;
+   }
+
+   *daddr = ct->tuplehash[!dir].tuple.src.u3.ip;
+#endif
+}
+
 static struct sock *xt_socket_lookup_slow_v4(struct net *net,
 const struct sk_buff *skb,
 const struct net_device *indev)
@@ -154,10 +192,6 @@ static struct sock *xt_socket_lookup_slow_v4(struct net 
*net,
__be32 uninitialized_var(daddr), uninitialized_var(saddr);
__be16 uninitialized_var(dport), uninitialized_var(sport);
u8 uninitialized_var(protocol);
-#ifdef XT_SOCKET_HAVE_CONNTRACK
-   struct nf_conn const *ct;
-   enum ip_conntrack_info ctinfo;
-#endif
 
if (iph->protocol == IPPROTO_UDP || iph->protocol == IPPROTO_TCP) {
struct udphdr _hdr, *hp;
@@ -185,25 +219,7 @@ static struct sock *xt_socket_lookup_slow_v4(struct net 
*net,
return NULL;
}
 
-#ifdef XT_SOCKET_HAVE_CONNTRACK
-   /* Do the lookup with the original socket address in
-* case this is a reply packet of an established
-* SNAT-ted connection.
-*/
-   ct = nf_ct_get(skb, );
-   if (ct && !nf_ct_is_untracked(ct) &&
-   ((iph->protocol != IPPROTO_ICMP &&
- ctinfo == IP_CT_ESTABLISHED_REPLY) ||
-(iph->protocol == IPPROTO_ICMP &&
- ctinfo == IP_CT_RELATED_REPLY)) &&
-   (ct->status & IPS_SRC_NAT_DONE)) {
-
-   daddr = ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple.src.u3.ip;
-   dport = (iph->protocol == IPPROTO_TCP) ?
-   ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple.src.u.tcp.port :
-   

Re: PROBLEM: TPROXY and DNAT broken (bisected to 079096f103fa)

2016-07-28 Thread Florian Westphal
Brandon Cazander  wrote:
> Hopefully that's enough detail to replicate this issue. I have the full 
> environment set up for both working and non-working kernel versions, so 
> please let me know if there's anything else I can provide.

No need, this reproduces easily with this two-line ruleset:

-t nat -A PREROUTING -d 192.168.7.20/32 -i eth0 -j DNAT --to-destination 
192.168.8.1
-t mangle -A PREROUTING -p tcp -m tcp --dport 8080 -j TPROXY --on-port 9876 
--on-ip 0.0.0.0 --tproxy-mark 0x1/0x1

AFAIU the problem is this:

SYN:
1. -j TPROXY finds listen sk, redirects to it
2. DNAT takes place (iphdr(skb)->daddr is mangled).
3. tcp stack puts request sk into ehash table.

Note that the ehash entry uses the updated/dnatted address.

ACK:
1. -j TPROXY finds no established or request socket
since it uses iph->daddr but ehash contains dnatted-to address
... so we redirect to the listener socket.

Before the ehash change, for skb to listen sk the kernel
used to search both the listener socket request queue and
the ehash table, using the iphdr daddr (which at this point
is the DNAT'ed address).  So this used to work because this
returns the request sk.

After the ehash change we only check syn cookie and will then
emit a reset.

Eric, AFAICS the only solution for this is to extend
TPROXY and obtain the lookup saddr/daddr info from the conntrack
entry instead of the ip headers, which should make this work again.

Do you agree?
Any other suggestions?

Thanks!


Re: PROBLEM: TPROXY and DNAT broken (bisected to 079096f103fa)

2016-07-27 Thread Eric Dumazet
On Wed, 2016-07-27 at 18:19 +, Brandon Cazander wrote:
> [1.] One line summary of the problem:
> Using TPROXY together with a DNAT rule (working on older kernels) fails to 
> work on newer kernels as of commit 079096f103fa
> 
> [2.] Full description of the problem/report:
> I performed a git bisect using a qemu image to test my example below, and the 
> bisect ended at this commit:
> 
> > commit 079096f103faca2dd87342cca6f23d4b34da8871
> > Author: Eric Dumazet 
> > Date:   Fri Oct 2 11:43:32 2015 -0700
> > 
> > tcp/dccp: install syn_recv requests into ehash table
> 
> [3.] Keywords: networking
> 
> [4.] Kernel information
> [4.1.] Kernel version (from /proc/version):
> Everything as of commit 079096f103fa (tested up to 4.5.0)
> 
> [4.2.] Kernel .config file:
> When performing the bisect, I built with make oldconfig. Let me know if you 
> want the whole .config file.
> 
> [5.] Most recent kernel version which did not have the bug:
> Any kernel that I built prior to commit 
> 079096f103faca2dd87342cca6f23d4b34da8871 did not have this issue.
> 
> [6.] no Oops
> 
> [7.] A small shell script or example program which triggers the
>  problem (if possible)
> 
> I have produced what I hope is a minimal example, using the instructions for 
> TPROXY from 
> http://lxr.linux.no/#linux+v3.10/Documentation/networking/tproxy.txt and an 
> example transparent TCP proxy written in C that I found at 
> https://github.com/kristrev/tproxy-example.
> 
> * I have a machine ("ROUTER") with 10.100.0.164/24 on eth0, and 
> 192.168.30.2/24 on eth1. This is running the tproxy-example program, with the 
> following rules:
> iptables -t mangle -N DIVERT
> iptables -t mangle -A PREROUTING -p tcp -m socket -j DIVERT
> iptables -t mangle -A DIVERT -j MARK --set-mark 1
> iptables -t mangle -A DIVERT -j ACCEPT
> iptables -t mangle -A PREROUTING -p tcp --dport 8080 -j TPROXY 
> --tproxy-mark 0x1/0x1 --on-port 9876
> iptables -t nat -I PREROUTING -i eth0 -d 42.0.1.1 -j DNAT --to-dest 
> 192.168.30.1
> ip rule add fwmark 1 lookup 100
> ip route add local 0.0.0.0/0 dev lo table 100
> 
> * There is a machine ("WEBSERVER") at 192.168.30.1/24 hosting a webserver on 
> port 8080.
> 
> * My workstation is at 10.100.0.206, and I have a static route for both 
> 192.168.30.2 and 42.0.1.1 via 10.100.0.164.
> 
> * Making a curl request to 192.168.30.2:8080 hits the transparent proxy and 
> works in both GOOD (before the aforementioned commit) kernel, and BAD (at the 
> commit or later) kernel.
> 
> * Making a curl request to 42.0.1.1:8080 hits the transparent proxy and works 
> in GOOD kernel but in BAD kernel I get:
> "curl: (56) Recv failure: Connection reset by peer"
> 
> * When it fails, no traffic hits the WEBSERVER. A tcpdump on the bad kernel 
> shows:
> root@dons-qemu-new-kernel:~# tcpdump -niany tcp and port 8080
> tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
> listening on any, link-type LINUX_SLL (Linux cooked), capture size 65535 
> bytes
> 16:42:31.551952 IP 10.100.0.206.35562 > 42.0.1.1.8080: Flags [S], seq 
> 3793582216, win 29200, options [mss 1460,sackOK,TS val 632068656 ecr 
> 0,nop,wscale 7], length 0
> 16:42:31.551988 IP 42.0.1.1.8080 > 10.100.0.206.35562: Flags [S.], seq 
> 4042636216, ack 3793582217, win 28960, options [mss 1460,sackOK,TS val 745382 
> ecr 632068656,nop,wscale 7], length 0
> 16:42:31.55 IP 10.100.0.206.35562 > 42.0.1.1.8080: Flags [.], ack 1, 
> win 229, options [nop,nop,TS val 632068657 ecr 745382], length 0
> 16:42:31.552238 IP 42.0.1.1.8080 > 10.100.0.206.35562: Flags [R], seq 
> 4042636217, win 0, length 0
> 16:42:31.552246 IP 10.100.0.206.35562 > 42.0.1.1.8080: Flags [P.], seq 
> 1:78, ack 1, win 229, options [nop,nop,TS val 632068657 ecr 745382], length 77
> 16:42:31.552251 IP 42.0.1.1.8080 > 10.100.0.206.35562: Flags [R], seq 
> 4042636217, win 0, length 0
> 16:42:32.551668 IP 42.0.1.1.8080 > 10.100.0.206.35562: Flags [S.], seq 
> 4042636216, ack 3793582217, win 28960, options [mss 1460,sackOK,TS val 745632 
> ecr 632068656,nop,wscale 7], length 0
> 16:42:32.551925 IP 10.100.0.206.35562 > 42.0.1.1.8080: Flags [R], seq 
> 3793582217, win 0, length 0
> 16:42:34.551668 IP 42.0.1.1.8080 > 10.100.0.206.35562: Flags [S.], seq 
> 4042636216, ack 3793582217, win 28960, options [mss 1460,sackOK,TS val 746132 
> ecr 632068656,nop,wscale 7], length 0
> 16:42:34.551995 IP 10.100.0.206.35562 > 42.0.1.1.8080: Flags [R], seq 
> 3793582217, win 0, length 0
> 
> * A tcpdump on a GOOD kernel shows:
> root@dons-qemu-old-kernel:~# tcpdump -niany tcp and port 8080
> tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
> listening on any, link-type LINUX_SLL (Linux cooked), capture size 65535 
> bytes
> 16:44:18.364537 IP 10.100.0.206.35996 > 42.0.1.1.8080: Flags [S], seq 
> 3963646692, win 29200, options [mss 1460,sackOK,TS val 632175966 ecr 

PROBLEM: TPROXY and DNAT broken (bisected to 079096f103fa)

2016-07-27 Thread Brandon Cazander
[1.] One line summary of the problem:
Using TPROXY together with a DNAT rule (working on older kernels) fails to work 
on newer kernels as of commit 079096f103fa

[2.] Full description of the problem/report:
I performed a git bisect using a qemu image to test my example below, and the 
bisect ended at this commit:

> commit 079096f103faca2dd87342cca6f23d4b34da8871
> Author: Eric Dumazet 
> Date:   Fri Oct 2 11:43:32 2015 -0700
> 
> tcp/dccp: install syn_recv requests into ehash table

[3.] Keywords: networking

[4.] Kernel information
[4.1.] Kernel version (from /proc/version):
Everything as of commit 079096f103fa (tested up to 4.5.0)

[4.2.] Kernel .config file:
When performing the bisect, I built with make oldconfig. Let me know if you 
want the whole .config file.

[5.] Most recent kernel version which did not have the bug:
Any kernel that I built prior to commit 
079096f103faca2dd87342cca6f23d4b34da8871 did not have this issue.

[6.] no Oops

[7.] A small shell script or example program which triggers the
 problem (if possible)

I have produced what I hope is a minimal example, using the instructions for 
TPROXY from 
http://lxr.linux.no/#linux+v3.10/Documentation/networking/tproxy.txt and an 
example transparent TCP proxy written in C that I found at 
https://github.com/kristrev/tproxy-example.

* I have a machine ("ROUTER") with 10.100.0.164/24 on eth0, and 192.168.30.2/24 
on eth1. This is running the tproxy-example program, with the following rules:
iptables -t mangle -N DIVERT
iptables -t mangle -A PREROUTING -p tcp -m socket -j DIVERT
iptables -t mangle -A DIVERT -j MARK --set-mark 1
iptables -t mangle -A DIVERT -j ACCEPT
iptables -t mangle -A PREROUTING -p tcp --dport 8080 -j TPROXY 
--tproxy-mark 0x1/0x1 --on-port 9876
iptables -t nat -I PREROUTING -i eth0 -d 42.0.1.1 -j DNAT --to-dest 
192.168.30.1
ip rule add fwmark 1 lookup 100
ip route add local 0.0.0.0/0 dev lo table 100

* There is a machine ("WEBSERVER") at 192.168.30.1/24 hosting a webserver on 
port 8080.

* My workstation is at 10.100.0.206, and I have a static route for both 
192.168.30.2 and 42.0.1.1 via 10.100.0.164.

* Making a curl request to 192.168.30.2:8080 hits the transparent proxy and 
works in both GOOD (before the aforementioned commit) kernel, and BAD (at the 
commit or later) kernel.

* Making a curl request to 42.0.1.1:8080 hits the transparent proxy and works 
in GOOD kernel but in BAD kernel I get:
"curl: (56) Recv failure: Connection reset by peer"

* When it fails, no traffic hits the WEBSERVER. A tcpdump on the bad kernel 
shows:
root@dons-qemu-new-kernel:~# tcpdump -niany tcp and port 8080
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on any, link-type LINUX_SLL (Linux cooked), capture size 65535 
bytes
16:42:31.551952 IP 10.100.0.206.35562 > 42.0.1.1.8080: Flags [S], seq 
3793582216, win 29200, options [mss 1460,sackOK,TS val 632068656 ecr 
0,nop,wscale 7], length 0
16:42:31.551988 IP 42.0.1.1.8080 > 10.100.0.206.35562: Flags [S.], seq 
4042636216, ack 3793582217, win 28960, options [mss 1460,sackOK,TS val 745382 
ecr 632068656,nop,wscale 7], length 0
16:42:31.55 IP 10.100.0.206.35562 > 42.0.1.1.8080: Flags [.], ack 1, 
win 229, options [nop,nop,TS val 632068657 ecr 745382], length 0
16:42:31.552238 IP 42.0.1.1.8080 > 10.100.0.206.35562: Flags [R], seq 
4042636217, win 0, length 0
16:42:31.552246 IP 10.100.0.206.35562 > 42.0.1.1.8080: Flags [P.], seq 
1:78, ack 1, win 229, options [nop,nop,TS val 632068657 ecr 745382], length 77
16:42:31.552251 IP 42.0.1.1.8080 > 10.100.0.206.35562: Flags [R], seq 
4042636217, win 0, length 0
16:42:32.551668 IP 42.0.1.1.8080 > 10.100.0.206.35562: Flags [S.], seq 
4042636216, ack 3793582217, win 28960, options [mss 1460,sackOK,TS val 745632 
ecr 632068656,nop,wscale 7], length 0
16:42:32.551925 IP 10.100.0.206.35562 > 42.0.1.1.8080: Flags [R], seq 
3793582217, win 0, length 0
16:42:34.551668 IP 42.0.1.1.8080 > 10.100.0.206.35562: Flags [S.], seq 
4042636216, ack 3793582217, win 28960, options [mss 1460,sackOK,TS val 746132 
ecr 632068656,nop,wscale 7], length 0
16:42:34.551995 IP 10.100.0.206.35562 > 42.0.1.1.8080: Flags [R], seq 
3793582217, win 0, length 0

* A tcpdump on a GOOD kernel shows:
root@dons-qemu-old-kernel:~# tcpdump -niany tcp and port 8080
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on any, link-type LINUX_SLL (Linux cooked), capture size 65535 
bytes
16:44:18.364537 IP 10.100.0.206.35996 > 42.0.1.1.8080: Flags [S], seq 
3963646692, win 29200, options [mss 1460,sackOK,TS val 632175966 ecr 
0,nop,wscale 7], length 0
16:44:18.364571 IP 42.0.1.1.8080 > 10.100.0.206.35996: Flags [S.], seq 
4117262662, ack 3963646693, win 14480, options [mss 1460,sackOK,TS val 
4294903654 ecr 632175966,nop,wscale 7], length 0
16:44:18.364819 IP 10.100.0.206.35996 >