Re: openvswitch conntrack and nat problem in first packet reply with RST

2017-03-14 Thread wenxu

you are correct! Thanks very much.

It's works  set a new example as following.

ip,in_port=2 actions=ct(table=1,zone=1,nat)
ip,in_port=3 actions=ct(table=1,zone=1,nat)

table=1, ct_state=+new+trk,tcp,in_port=2,tp_dst=123 
actions=ct(commit,zone=1,nat(src=2.2.1.7)),output:3
table=1, ct_state=+new+trk,icmp,in_port=2 
actions=ct(commit,zone=1,nat(src=2.2.1.7)),output:3
table=1, ct_state=+new+trk,ip,in_port=3 
actions=ct(commit,zone=1,nat(dst=192.168.0.7)),output:2
table=1, ct_state=+new+trk, priority=100, tcp,in_port=3,tp_dst=123 actions=drop
table=1, ct_state=+est+trk,ip,in_port=3 actions=output:2
table=1, ct_state=+est+trk,ip,in_port=2 actions=output:3





> On 13 March 2017 at 20:18, wenxu  wrote:
>> Hi all,
>>
>> There is a simple test for conntrack and nat in openvswitch.  I want to do 
>> stateful
>> firewall with conntrack then do nat
>>
>> netns1 port1 with ip 10.0.0.7
>> netns2 port2 with ip 1.1.1.7
>>
>> netns1 10.0.0.7 src -nat to 2.2.1.7 access netns2 1.1.1.7
>>
>> 1. # ovs-ofctl add-flow br0  'ip,in_port=1 actions=ct(table=1,zone=1)'
>> 2. # ovs-ofctl add-flow br0  'ip,in_port=2 actions=ct(table=1,zone=1)'
>> 3. # ovs-ofctl add-flow br0  'table=1, 
>> ct_state=+new+trk,tcp,in_port=1,tp_dst=123 
>> actions=ct(commit,zone=1,nat(src=2.2.1.7)),output:2'
>> 4. # ovs-ofctl add-flow br0  'table=1, ct_state=+est+trk,ip,in_port=2 
>> actions=ct(commit,zone=1,nat(dst=10.0.0.7)),output:1'
>> 5. # ovs-ofctl add-flow br0  'table=1, ct_state=+est+trk,ip,in_port=1  
>> actions=ct(commit,zone=1,nat(src=2.2.1.7)),output:2'
>>
>>
>> I  found that  netns1 can access 1.1.1.7:123  when there is 123-port listen 
>> on 1.1.1.7  in netns2
>>
>> But if there is no listen 123 port, The first RST packet reply by 1.1.1.7
>> (no datapath kernel rule) can't do dst-nat back to 10.0.0.7.  The second RST 
>> packet is ok (there is datapath kernel rule which comes from first RST 
>> packet)
>>
>> # tcpdump -i eth0 -nnn
>> tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
>> listening on eth0, link-type EN10MB (Ethernet), capture size 65535 bytes
>> 14:44:13.575200 IP 10.0.0.7.39891 > 1.1.1.7.123: Flags [S], seq 93585, 
>> win 29200, options [mss 1460,sackOK,TS val 584707316 ecr 0,nop,wscale 7], 
>> length 0
>> 14:44:13.576036 IP 1.1.1.7.123 > 2.2.1.7.39891: Flags [R.], seq 0, ack 
>> 93586, win 0, length 0
>>
>> But the datapath flow is correct
>> # ovs-dpctl dump-flows
>> recirc_id(0),in_port(7),eth_type(0x0800),ipv4(frag=no), packets:0, bytes:0, 
>> used:never, actions:ct(zone=1),recirc(0x5a)
>> recirc_id(0x5a),in_port(7),ct_state(+new+trk),eth_type(0x0800),ipv4(proto=6,frag=no),tcp(dst=123),
>>  packets:0, bytes:0, used:never,
>> actions:ct(commit,zone=1,nat(src=2.2.1.7)),8
>> recirc_id(0),in_port(8),eth_type(0x0800),ipv4(frag=no), packets:0, bytes:0, 
>> used:never, actions:ct(zone=1),recirc(0x5b)
>> recirc_id(0x5b),in_port(8),ct_state(-new+est+trk),eth_type(0x0800),ipv4(frag=no),
>>  packets:0, bytes:0, used:never,
>> actions:ct(commit,zone=1,nat(dst=10.0.0.7)),7
>>
>>
>> I think It's a matter with the PACKET-OUT and RST packet
>>
>> There are two packet-out for rule2 and rul4. Rule2 go through connect track 
>> and find it is an RST packet then delete the conntrack . It leads the second 
>> packet(come from rule4) can't find the conntack to do dst-nat.
>>
>> In "netfilter/nf_conntrack_proto_tcp.c file
>>  if (!test_bit(IPS_SEEN_REPLY_BIT, >status)) {
>> /* If only reply is a RST, we can consider ourselves not to
>>have an established connection: this is a fairly common
>>problem case, so we can delete the conntrack
>>immediately.  --RR */
>> if (th->rst ) {
>> nf_ct_kill_acct(ct, ctinfo, skb);
>> return NF_ACCEPT;
>> }
>> }
>>
>>
>> It should add a switch to avoid this conntrack  be deleted.
>>
>> if (!test_bit(IPS_SEEN_REPLY_BIT, >status)) {
>> /* If only reply is a RST, we can consider ourselves not to
>>have an established connection: this is a fairly common
>>problem case, so we can delete the conntrack
>>immediately.  --RR */
>> -if (th->rst ) {
>> +if (th->rst && !nf_ct_tcp_rst_no_kill) {
>> nf_ct_kill_acct(ct, ctinfo, skb);
>> return NF_ACCEPT;
>> }
> How would you know to not kill the entry? How would you ensure it's
> properly cleaned up later? I'm not sure if there's a way to implement
> this without some fairly serious plumbing.
>
> If you look at the examples in the OVS testsuite[0], it is suggested
> to use "ct(nat)" with no options early in your rules. This ensures
> that the connection is looked up, and if necessary, NAT is applied at
> the same time - meaning that the RST can be NATed back AND the
> connection is deleted. In 

Re: openvswitch conntrack and nat problem in first packet reply with RST

2017-03-14 Thread Joe Stringer
On 13 March 2017 at 20:18, wenxu  wrote:
> Hi all,
>
> There is a simple test for conntrack and nat in openvswitch.  I want to do 
> stateful
> firewall with conntrack then do nat
>
> netns1 port1 with ip 10.0.0.7
> netns2 port2 with ip 1.1.1.7
>
> netns1 10.0.0.7 src -nat to 2.2.1.7 access netns2 1.1.1.7
>
> 1. # ovs-ofctl add-flow br0  'ip,in_port=1 actions=ct(table=1,zone=1)'
> 2. # ovs-ofctl add-flow br0  'ip,in_port=2 actions=ct(table=1,zone=1)'
> 3. # ovs-ofctl add-flow br0  'table=1, 
> ct_state=+new+trk,tcp,in_port=1,tp_dst=123 
> actions=ct(commit,zone=1,nat(src=2.2.1.7)),output:2'
> 4. # ovs-ofctl add-flow br0  'table=1, ct_state=+est+trk,ip,in_port=2 
> actions=ct(commit,zone=1,nat(dst=10.0.0.7)),output:1'
> 5. # ovs-ofctl add-flow br0  'table=1, ct_state=+est+trk,ip,in_port=1  
> actions=ct(commit,zone=1,nat(src=2.2.1.7)),output:2'
>
>
> I  found that  netns1 can access 1.1.1.7:123  when there is 123-port listen 
> on 1.1.1.7  in netns2
>
> But if there is no listen 123 port, The first RST packet reply by 1.1.1.7
> (no datapath kernel rule) can't do dst-nat back to 10.0.0.7.  The second RST 
> packet is ok (there is datapath kernel rule which comes from first RST packet)
>
> # tcpdump -i eth0 -nnn
> tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
> listening on eth0, link-type EN10MB (Ethernet), capture size 65535 bytes
> 14:44:13.575200 IP 10.0.0.7.39891 > 1.1.1.7.123: Flags [S], seq 93585, 
> win 29200, options [mss 1460,sackOK,TS val 584707316 ecr 0,nop,wscale 7], 
> length 0
> 14:44:13.576036 IP 1.1.1.7.123 > 2.2.1.7.39891: Flags [R.], seq 0, ack 
> 93586, win 0, length 0
>
> But the datapath flow is correct
> # ovs-dpctl dump-flows
> recirc_id(0),in_port(7),eth_type(0x0800),ipv4(frag=no), packets:0, bytes:0, 
> used:never, actions:ct(zone=1),recirc(0x5a)
> recirc_id(0x5a),in_port(7),ct_state(+new+trk),eth_type(0x0800),ipv4(proto=6,frag=no),tcp(dst=123),
>  packets:0, bytes:0, used:never,
> actions:ct(commit,zone=1,nat(src=2.2.1.7)),8
> recirc_id(0),in_port(8),eth_type(0x0800),ipv4(frag=no), packets:0, bytes:0, 
> used:never, actions:ct(zone=1),recirc(0x5b)
> recirc_id(0x5b),in_port(8),ct_state(-new+est+trk),eth_type(0x0800),ipv4(frag=no),
>  packets:0, bytes:0, used:never,
> actions:ct(commit,zone=1,nat(dst=10.0.0.7)),7
>
>
> I think It's a matter with the PACKET-OUT and RST packet
>
> There are two packet-out for rule2 and rul4. Rule2 go through connect track 
> and find it is an RST packet then delete the conntrack . It leads the second 
> packet(come from rule4) can't find the conntack to do dst-nat.
>
> In "netfilter/nf_conntrack_proto_tcp.c file
>  if (!test_bit(IPS_SEEN_REPLY_BIT, >status)) {
> /* If only reply is a RST, we can consider ourselves not to
>have an established connection: this is a fairly common
>problem case, so we can delete the conntrack
>immediately.  --RR */
> if (th->rst ) {
> nf_ct_kill_acct(ct, ctinfo, skb);
> return NF_ACCEPT;
> }
> }
>
>
> It should add a switch to avoid this conntrack  be deleted.
>
> if (!test_bit(IPS_SEEN_REPLY_BIT, >status)) {
> /* If only reply is a RST, we can consider ourselves not to
>have an established connection: this is a fairly common
>problem case, so we can delete the conntrack
>immediately.  --RR */
> -if (th->rst ) {
> +if (th->rst && !nf_ct_tcp_rst_no_kill) {
> nf_ct_kill_acct(ct, ctinfo, skb);
> return NF_ACCEPT;
> }

How would you know to not kill the entry? How would you ensure it's
properly cleaned up later? I'm not sure if there's a way to implement
this without some fairly serious plumbing.

If you look at the examples in the OVS testsuite[0], it is suggested
to use "ct(nat)" with no options early in your rules. This ensures
that the connection is looked up, and if necessary, NAT is applied at
the same time - meaning that the RST can be NATed back AND the
connection is deleted. In the later table you need to differentiate
the connections based on whether they were already statefully NATed or
not. For new connections, it would be handled by your rule #3 (which
would then perform the nat as part of that rule's actions). For
existing connections, the packet is already NATed by the time it
reaches table 1, and your rules 4-5 shouldn't need to apply the nat.
If you still need access to the original tuple for matching purposes,
the new fields 'ct_nw_src', 'ct_nw_dst', etc. fields will provide the
original ct 5tuple. Note however those are only available on OVS
master, should be part of OVS 2.8.

[0] 
https://github.com/openvswitch/ovs/blob/branch-2.7/tests/system-traffic.at#L2331
[1] http://openvswitch.org/support/dist-docs/ovs-fields.7.html


openvswitch conntrack and nat problem in first packet reply with RST

2017-03-13 Thread wenxu
Hi all,

There is a simple test for conntrack and nat in openvswitch.  I want to do 
stateful
firewall with conntrack then do nat

netns1 port1 with ip 10.0.0.7
netns2 port2 with ip 1.1.1.7

netns1 10.0.0.7 src -nat to 2.2.1.7 access netns2 1.1.1.7

1. # ovs-ofctl add-flow br0  'ip,in_port=1 actions=ct(table=1,zone=1)'
2. # ovs-ofctl add-flow br0  'ip,in_port=2 actions=ct(table=1,zone=1)'
3. # ovs-ofctl add-flow br0  'table=1, 
ct_state=+new+trk,tcp,in_port=1,tp_dst=123 
actions=ct(commit,zone=1,nat(src=2.2.1.7)),output:2'
4. # ovs-ofctl add-flow br0  'table=1, ct_state=+est+trk,ip,in_port=2 
actions=ct(commit,zone=1,nat(dst=10.0.0.7)),output:1'
5. # ovs-ofctl add-flow br0  'table=1, ct_state=+est+trk,ip,in_port=1  
actions=ct(commit,zone=1,nat(src=2.2.1.7)),output:2'


I  found that  netns1 can access 1.1.1.7:123  when there is 123-port listen on 
1.1.1.7  in netns2

But if there is no listen 123 port, The first RST packet reply by 1.1.1.7
(no datapath kernel rule) can't do dst-nat back to 10.0.0.7.  The second RST 
packet is ok (there is datapath kernel rule which comes from first RST packet)

# tcpdump -i eth0 -nnn
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 65535 bytes
14:44:13.575200 IP 10.0.0.7.39891 > 1.1.1.7.123: Flags [S], seq 93585, win 
29200, options [mss 1460,sackOK,TS val 584707316 ecr 0,nop,wscale 7], length 0
14:44:13.576036 IP 1.1.1.7.123 > 2.2.1.7.39891: Flags [R.], seq 0, ack 
93586, win 0, length 0

But the datapath flow is correct
# ovs-dpctl dump-flows
recirc_id(0),in_port(7),eth_type(0x0800),ipv4(frag=no), packets:0, bytes:0, 
used:never, actions:ct(zone=1),recirc(0x5a)
recirc_id(0x5a),in_port(7),ct_state(+new+trk),eth_type(0x0800),ipv4(proto=6,frag=no),tcp(dst=123),
 packets:0, bytes:0, used:never,
actions:ct(commit,zone=1,nat(src=2.2.1.7)),8
recirc_id(0),in_port(8),eth_type(0x0800),ipv4(frag=no), packets:0, bytes:0, 
used:never, actions:ct(zone=1),recirc(0x5b)
recirc_id(0x5b),in_port(8),ct_state(-new+est+trk),eth_type(0x0800),ipv4(frag=no),
 packets:0, bytes:0, used:never,
actions:ct(commit,zone=1,nat(dst=10.0.0.7)),7


I think It's a matter with the PACKET-OUT and RST packet

There are two packet-out for rule2 and rul4. Rule2 go through connect track and 
find it is an RST packet then delete the conntrack . It leads the second 
packet(come from rule4) can't find the conntack to do dst-nat.

In "netfilter/nf_conntrack_proto_tcp.c file
 if (!test_bit(IPS_SEEN_REPLY_BIT, >status)) {
/* If only reply is a RST, we can consider ourselves not to
   have an established connection: this is a fairly common
   problem case, so we can delete the conntrack
   immediately.  --RR */
if (th->rst ) {
nf_ct_kill_acct(ct, ctinfo, skb);
return NF_ACCEPT;
}
}


It should add a switch to avoid this conntrack  be deleted.

if (!test_bit(IPS_SEEN_REPLY_BIT, >status)) {
/* If only reply is a RST, we can consider ourselves not to
   have an established connection: this is a fairly common
   problem case, so we can delete the conntrack
   immediately.  --RR */
-if (th->rst ) {
+if (th->rst && !nf_ct_tcp_rst_no_kill) {
nf_ct_kill_acct(ct, ctinfo, skb);
return NF_ACCEPT;
}


BR
wenxu