Hi Jan As discussed and agreed at OVSCON, I submitted a patch to bring the userspace connection tracker established state In line with that of the kernel. I used a similar patch to what I earlier suggested earlier in this thread, adding a test and also made some documentation updates.
Some of the discussion in this thread was somewhat orthogonal to bringing userspace ‘established’ in line with kernel ‘established’, but it appears to have been useful as some new recommendations may come out of it with respect to recommended practices, for conntrack pipeline design. Thanks Darrell From: Jan Scheurich <jan.scheur...@ericsson.com> Date: Saturday, November 4, 2017 at 4:54 AM To: Darrel Ball <db...@vmware.com>, Rohith Basavaraja <rohith.basavar...@ericsson.com> Cc: "d...@openvswitch.org" <d...@openvswitch.org> Subject: RE: [ovs-discuss] Conntrack issue in OVS (2.6)+DPDK Hi Darrel, The example pipeline I crafted was not meant to be a realistic conntrack application but to demonstrate the semantic differences between userspace and kernel implementation and to discuss our problems with the current documentation. I fully agree with your proposal for a proper ICMP set of rules. It would work equally for kernel and userspace datapath. But there are other rule sets where there they behave differently and we believe this is not good. The original pipeline brought up by Rohith in August is the implementation of OpenStack Security Groups in OpenDaylight. In general ODL does not commit connections in the untrusted direction. However, in the problematic scenario (two Neutron ports in the same Neutron Network but in different Security Groups, co-located on the same OVS instance) the connection was committed (as trusted) on the sending side. The packet should have been dropped on the receiving side but the ct() lookup for the first packet on egress hits the committed connection and passes because ODL uses one conntrack zone per Neutron Network rather than per Security Group. I think this is wrong and using one zone per Security Group would probably solve this specific issue. But with the kernel datapath this issue never surfaced because the connection is not considered established prior to the first reply packet so that the second lookup of the first packet on egress still yields +new-est. So the ODL developers testing with kernel datapath assumed their design was suitable. You can argue that was a misunderstanding of the function but the discrepancy between documentation and kernel behavior certainly didn’t help. Perhaps it is better we continue this discussion in person during the OVS conference? Regards, Jan From: Darrell Ball [mailto:db...@vmware.com] Sent: Saturday, 04 November, 2017 01:47 To: Jan Scheurich <jan.scheur...@ericsson.com>; Rohith Basavaraja <rohith.basavar...@ericsson.com> Cc: d...@openvswitch.org Subject: Re: [ovs-discuss] Conntrack issue in OVS (2.6)+DPDK From: Jan Scheurich <jan.scheur...@ericsson.com<mailto:jan.scheur...@ericsson.com>> Date: Friday, November 3, 2017 at 6:22 AM To: Darrel Ball <db...@vmware.com<mailto:db...@vmware.com>>, Rohith Basavaraja <rohith.basavar...@ericsson.com<mailto:rohith.basavar...@ericsson.com>> Cc: "d...@openvswitch.org<mailto:d...@openvswitch.org>" <d...@openvswitch.org<mailto:d...@openvswitch.org>> Subject: RE: [ovs-discuss] Conntrack issue in OVS (2.6)+DPDK Hi Darrel, I have now been able to actually test the example pipelines I provided earlier. Turns out that the first one I sent was correct. [Darrell] Sure, let us discuss the first example; let me know if you want to discuss the second example you gave as well. Please note that it was not meant as realistic conntrack pipeline Darrell] Again, I can agree your example is not realistic or recommended. No one would write rules like this. The rules would certainly be written properly so that the trusted direction (the one that does the commit) allows the first packet through; this is a fundamental principle of conntrack. There are an infinite number of ways to misuse conntrack rules and no one can prevent misuse. On a similar topic, another fundamental problem I saw with the original discussion (from Aug) is creating a conntrack pipeline that commits a connection in the untrusted direction. That is also not something we do or recommend others do. This ‘suboptimal design approach’ brought us to the question on when a packet gets labelled as ESTABLISHED. Normally, the difference would not be noticed, since a connection would not be committed in the untrusted direction and hence EST would not be possible unless another rule correctly commits in the trusted direction. I’ll add more comments below. but just to demonstrate the misalignment between userspace and kernel conntrack and the conflict of both with the documentation. The following pipeline is now tested: ovs-ofctl add-flow br0 "table=0,priority=10,in_port=1,icmp actions=ct(table=1,zone=5000)" ovs-ofctl add-flow br0 "table=0,priority=10,in_port=1,arp actions=output:2" ovs-ofctl add-flow br0 "table=0,priority=10,in_port=2 actions=output:1" ovs-ofctl add-flow br0 "table=1,priority=10,in_port=1,ct_state=-new+est-rel-inv+trk actions=output:2" ovs-ofctl add-flow br0 "table=1,priority=10,in_port=1,ip,ct_state=+new+trk actions=ct(commit,zone=5000),goto_table:2" ovs-ofctl add-flow br0 "table=2,priority=10,in_port=1,ct_state=-new+est-rel-inv+trk actions=output:2" The ct(commit) action in table 1 commits a new connection entry, but the subsequent match in table 2 proves that the ct_state of the packet is still not EST despite the commit. >>>>>>>>>>>>> [Darrell] I assumed you meant “lack of match in table 2” per your following test result. You use zoning in your rules without effect and you even split the pipeline with goto_table – we would not do this. Out of 6 rules, probably at least 4 of them are not what you want. I think there is a big disconnect here and I feel we are wasting time discussing such a contrived pipeline. Here is a simplified set of rules that might be reasonably used for just icmp: priority=1,action=drop priority=10,arp,action=normal table=0,priority=10,in_port=2,icmp,ct_state=-trk,action=ct(table=0) table=0,priority=10,in_port=2,icmp,ct_state=+trk+est actions=output:1 table=0,priority=10,in_port=1,icmp actions=ct(commit,table=1) table=1,priority=10,in_port=1,icmp,ct_state=+trk+est actions=output:2 table=1,priority=10,in_port=1,icmp,ct_state=+trk+new actions=output:2 <<<<<<<<<<<<< This contradicts the statement in man ovs-fields: “est (0x02) Part of an existing connection. Set to 1 if this is a committed connection.” >>>>>>>>> [Darrell] No, it does not. Same answer as earlier [Darrell Let me clear up some misconceptions, ct(commit ) is a prerequisite for EST being set for a later packet seen. A ‘packet’ (not a connection) that hits an existing conntrack entry is marked as established and that’s what the documentation says; the idea behind the ‘OVS conntrack EST state’ is to make the state intuitive. “est (0x02) Part of an existing connection. Set to 1 if this is a committed connection.” ESTABLISHED is an attribute of a packet hitting an existing conntrack entry (“Part of an existing connection”), not the conntrack entry itself. So, a packet that hits an ‘existing’ entry (which is a committed connection, of course) gets its state set to EST. I agree this is subtle, because the reader has to know that EST is a state of the packet not the connection entry itself; the wording could have been better. <<<<<<<<< Consequently the userspace datapath drops the first ICMP packet: root@ubuntu:~# ip netns exec ns1 ping -c1 192.168.10.20 PING 192.168.10.20 (192.168.10.20) 56(84) bytes of data. --- 192.168.10.20 ping statistics --- 1 packets transmitted, 0 received, 100% packet loss, time 0ms root@ubuntu:/opt/ovs# ovs-ofctl -Oopenflow13 dump-flows br0 cookie=0x0, duration=30.885s, table=0, n_packets=1, n_bytes=98, reset_counts priority=10,icmp,in_port="br0-ns1" actions=ct(table=1,zone=5000) cookie=0x0, duration=30.848s, table=0, n_packets=0, n_bytes=0, reset_counts priority=10,arp,in_port="br0-ns1" actions=output:"br0-ns2" cookie=0x0, duration=30.815s, table=0, n_packets=0, n_bytes=0, reset_counts priority=10,in_port="br0-ns2" actions=output:"br0-ns1" cookie=0x0, duration=30.783s, table=1, n_packets=0, n_bytes=0, reset_counts priority=10,ct_state=-new+est-rel-inv+trk,in_port="br0-ns1" actions=output:"br0-ns2" cookie=0x0, duration=30.746s, table=1, n_packets=1, n_bytes=98, reset_counts priority=10,ct_state=+new+trk,ip,in_port="br0-ns1" actions=ct(commit,zone=5000),resubmit(,2) cookie=0x0, duration=30.712s, table=2, n_packets=0, n_bytes=0, reset_counts priority=10,ct_state=-new+est-rel-inv+trk,in_port="br0-ns1" actions=output:"br0-ns2" root@ubuntu:/opt/ovs# ovs-appctl dpctl/dump-flows recirc_id(0),in_port(2),packet_type(ns=0,id=0),eth_type(0x0806), packets:0, bytes:0, used:never, actions:3 recirc_id(0),in_port(3),packet_type(ns=0,id=0),eth_type(0x0806), packets:0, bytes:0, used:never, actions:2 ct_state(+new-est-rel-inv+trk),recirc_id(0x2),in_port(2),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(frag=no), packets:0, bytes:0, used:never, actions:ct(commit,zone=5000) recirc_id(0),in_port(2),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(proto=1,frag=no), packets:0, bytes:0, used:never, actions:ct(zone=5000),recirc(0x2) root@ubuntu:/opt/ovs# ovs-appctl dpctl/dump-conntrack icmp,orig=(src=192.168.10.10,dst=192.168.10.20,id=20697,type=8,code=0),reply=(src=192.168.10.20,dst=192.168.10.10,id=20697,type=0,code=0),zone=5000 But when I send two ICMP packets in a row, the second packet hits the connection entry committed by the first dropped packet and goes through: root@ubuntu:~# ip netns exec ns1 ping -c2 192.168.10.20 PING 192.168.10.20 (192.168.10.20) 56(84) bytes of data. 64 bytes from 192.168.10.20: icmp_seq=2 ttl=64 time=1.87 ms --- 192.168.10.20 ping statistics --- 2 packets transmitted, 1 received, 50% packet loss, time 1006ms rtt min/avg/max/mdev = 1.874/1.874/1.874/0.000 ms root@ubuntu:/opt/ovs# ovs-ofctl -Oopenflow13 dump-flows br0 cookie=0x0, duration=40.727s, table=0, n_packets=2, n_bytes=196, reset_counts priority=10,icmp,in_port="br0-ns1" actions=ct(table=1,zone=5000) cookie=0x0, duration=40.696s, table=0, n_packets=1, n_bytes=42, reset_counts priority=10,arp,in_port="br0-ns1" actions=output:"br0-ns2" cookie=0x0, duration=40.667s, table=0, n_packets=2, n_bytes=140, reset_counts priority=10,in_port="br0-ns2" actions=output:"br0-ns1" cookie=0x0, duration=40.631s, table=1, n_packets=1, n_bytes=98, reset_counts priority=10,ct_state=-new+est-rel-inv+trk,in_port="br0-ns1" actions=output:"br0-ns2" cookie=0x0, duration=40.602s, table=1, n_packets=1, n_bytes=98, reset_counts priority=10,ct_state=+new+trk,ip,in_port="br0-ns1" actions=ct(commit,zone=5000),resubmit(,2) cookie=0x0, duration=40.566s, table=2, n_packets=0, n_bytes=0, reset_counts priority=10,ct_state=-new+est-rel-inv+trk,in_port="br0-ns1" actions=output:"br0-ns2" root@ubuntu:/opt/ovs# ovs-appctl dpctl/dump-flows recirc_id(0),in_port(2),packet_type(ns=0,id=0),eth_type(0x0806), packets:0, bytes:0, used:never, actions:3 recirc_id(0),in_port(3),packet_type(ns=0,id=0),eth_type(0x0806), packets:0, bytes:0, used:never, actions:2 recirc_id(0),in_port(2),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(proto=1,frag=no), packets:1, bytes:98, used:4.149s, actions:ct(zone=5000),recirc(0x4) ct_state(+new-est-rel-inv+trk),recirc_id(0x4),in_port(2),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(frag=no), packets:0, bytes:0, used:never, actions:ct(commit,zone=5000) ct_state(-new+est-rel-inv+trk),recirc_id(0x4),in_port(2),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(frag=no), packets:0, bytes:0, used:never, actions:3 recirc_id(0),in_port(3),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(frag=no), packets:0, bytes:0, used:never, actions:2 The ct() lookup in table 0 for subsequent packets sets the packet’s ct_state to EST no matter if conntrack has seen reply packets or not. [Darrell Let me clear up some misconceptions, ct(commit ) is a prerequisite for EST being set for a later packet seen. A ‘packet’ (not a connection) that hits an existing conntrack entry is marked as established and that’s what the documentation says; the idea behind the ‘OVS conntrack EST state’ is to make the state intuitive. Well, I would question if the current definition of EST state in the OVS documentation is intuitive. It certainly has fooled us ;-) But also the ODL developers who have rather based their Security Group pipeline design on the exhibited behavior of the kernel datapath. I’d find it much more intuitive if the ct_state of a packet reflected the state of the tracked connection at the end of the last ct() action. Directly after a commit of a new connection it should still be NEW. Only when a connection is really ‘established’ it should change to EST. The definition of when a connection is established actually depends on the protocol type. For icmp and udp (other) it is indeed the lookup of the first reply packet when the corresponding entries enter states ICMPS_REPLY and OTHERS_BIDIR, respectively. For tcp the trigger should be the lookup of a valid SYN/ACK packet from the remote side. [Darrell] Bringing userspace ESTABLISHED inline with the kernel ESTABLISHED is trivial; for 2.8/master, it is below (for 2.6 it is similar, but a few lines less) I’m not sure that your simple patch that just checks the reply direction as prerequisite for all protocols is sufficient. We’d rather suggest to base the EST ct state on the actual connection state as result of xxx_conn_update(). [Darrell] 1/ As I mentioned before, EST is a packet state; a prerequisite for EST is a committed connection. 2/ TCP conn update code understands what valid is and tracks acks in this regard. 3/ For UDP and ICMP, I also don’t intend to conflate ESTABLISHED with VALID. That may require a new return value (CT_UPDATE_UNCHANGED) in these functions that leaves the ct state of the packet unchanged. The conntrack modules (_tcp, _icmp, _other) would only return CT_UPDATE_VALID when the connection is established. [Darrell] No, I disagree; VALID and ESTABLISHED are very different; I will not conflate them. Packets can be valid without being marked as established. [Darrell] The OVS documented definition of ESTABLISHED is actually better, but I don’t think that is very important and I think most users will not care or even notice the difference. We find the documented OVS definition rather confusing. Now, after all our discussions and tests, I tend to agree that the current implementation of userspace conntrack is actually very straightforward and could be described in simple terms, but since it does not match the kernel datapath behavior, which in our view sets the reference, that won’t help. [Darrell] Actually, the contrived example (raison d'etre) you first provided in Aug is based on committing a connection in the untrusted direction; as mentioned earlier, we don’t do that and we don’t recommend others do it either. Here’s a proposal for an improved description of ct_state in ovs-fields: Connection Tracking State Field Name: ct_state Width: 32 bits Format: ct state Masking: arbitrary bitwise masks Prerequisites: none Access: read-only OpenFlow 1.0: not supported OpenFlow 1.1: not supported OXM: none NXM: NXM_NX_CT_STATE (105) since Open vSwitch 2.5 This field holds several flags that can be used to determine the state of the con‐ nection to which the packet belongs. It is initially zero and updated every time a ct() action is executed. It reflects the state of the packet and of its associated connection, if any, at completion of the ct() action. Only committed connections are being tracked. Matches on this field are most conveniently written in terms of symbolic names (listed below), each preceded by either + for a flag that must be set, or - for a flag that must be unset, without any other delimiters between the flags. Flags not mentioned are wildcarded. For example, tcp,ct_state=+trk-new matches TCP packets that have been run through the connection tracker and do not establish a new con‐ nection. Matches can also be written as flags/mask, where flags and mask are 32-bit numbers in decimal or in hexadecimal prefixed by 0x. The following flags are defined: new (0x01) A new connection. Set to 1 if there exists no committed connection for the packet yet, or if the committed connection is not yet fully established. est (0x02) Part of an established connection. Set to 1 if there is a committed connection for the packet and the connection is fully established. A TCP connection is established when the connection tracker has seen the SYN-ACK from the destination. For UDP and ICMP the connection is established when the connection tracker has seen the first reply packet. rel (0x04) Related to an existing connection, e.g. an ICMP ``destination unreachable’’ message or an FTP data connections. This flag will only be 1 if the connection to which this one is related is commit‐ ted. Connections identified as rel are separate from the originating con‐ nection and must be committed separately. All packets for a related connection will have the rel flag set, not just the initial packet. rpl (0x08) This packet is in the reply direction, meaning that it is in the opposite direction from the packet that initiated the connection. This flag will only be 1 if the connection is committed. inv (0x10) The state is invalid, meaning that the connection tracker couldn’t identify the connection. This flag is a catch-all for problems in the connection or the connection tracker, such as: · L3/L4 protocol handler is not loaded/unavailable. With the Linux kernel datapath, this may mean that the nf_con‐ ntrack_ipv4 or nf_conntrack_ipv6 modules are not loaded. · L3/L4 protocol handler determines that the packet is mal‐ formed. · Packets are unexpected length for protocol. trk (0x20) This packet is tracked, meaning that it has previously traversed the connection tracker. If this flag is not set, then no other flags will be set. If this flag is set, then the packet is tracked and other flags may also be set. snat (0x40) This packet was transformed by source address/port translation by a preceding ct action. Open vSwitch 2.6 added this flag. dnat (0x80) This packet was transformed by destination address/port translation by a preceding ct action. Open vSwitch 2.6 added this flag. There are additional constraints on these flags, listed in decreasing order of precedence below: 1. If trk is unset, no other flags are set. 2. If trk is set, one or more other flags may be set. 3. If inv is set, only the trk flag is also set. 4. new and est are mutually exclusive. 5. new and rpl are mutually exclusive. 6. rel may be set in conjunction with any other flags. Future versions of Open vSwitch may define new flags. What do you think? BR, Jan From: Darrell Ball [mailto:db...@vmware.com] Sent: Friday, 03 November, 2017 06:26 To: Jan Scheurich <jan.scheur...@ericsson.com<mailto:jan.scheur...@ericsson.com>>; Rohith Basavaraja <rohith.basavar...@ericsson.com<mailto:rohith.basavar...@ericsson.com>> Subject: Re: [ovs-discuss] Conntrack issue in OVS (2.6)+DPDK One update inline regarding the kernel/userspace ESTABLISHED definition syncing However, we still need to resolve the other discussion points. From: Darrel Ball <db...@vmware.com<mailto:db...@vmware.com>> Date: Thursday, November 2, 2017 at 7:46 PM To: Jan Scheurich <jan.scheur...@ericsson.com<mailto:jan.scheur...@ericsson.com>>, Rohith Basavaraja <rohith.basavar...@ericsson.com<mailto:rohith.basavar...@ericsson.com>> Subject: Re: [ovs-discuss] Conntrack issue in OVS (2.6)+DPDK I am checking a few things so I’ll get back to you, but I have a couple comments inline. From: Jan Scheurich <jan.scheur...@ericsson.com<mailto:jan.scheur...@ericsson.com>> Date: Thursday, November 2, 2017 at 11:41 AM To: Darrel Ball <db...@vmware.com<mailto:db...@vmware.com>>, Rohith Basavaraja <rohith.basavar...@ericsson.com<mailto:rohith.basavar...@ericsson.com>> Subject: RE: [ovs-discuss] Conntrack issue in OVS (2.6)+DPDK Hi Darrell, Sorry for the confusion. One of our points concerns actually *is* that the userspace conntrack sets the connection to ESTABLISHED without the commit. [Darrell] I am still interested to know about this case; pls provide a test. But, more importantly, that the kernel conntrack does not set the connection to ESTABLISHED at all through a second ct() lookup for a packet in the same direction, with or without commit. My example was trying to demonstrate that ct(commit) does not move the new connection to ESTABLISHED, but I didn’t test it and I think now that it might, but not because of the commit but just because of the second ct() action for the same packet. [Darrell] Yes, your previous example is not correct for your purpose. Let me clear up some misconceptions, ct(commit ) is a prerequisite for EST being set for a later packet seen. A ‘packet’ (not a connection) that hits an existing conntrack entry is marked as established and that’s what the documentation says; the idea behind the ‘OVS conntrack EST state’ is to make the state intuitive. So a better example to demonstrate the misbehavior of userspace datapath would be as follows: table=0,priority=10,in_port=1,icmp actions=ct(table=1,zone=5000) table=0,priority=10,in_port=1,arp actions=output:2 table=0,priority=10,in_port=2 actions=output:1 table=1,priority=10,in_port=1,ct_state=-new+est-rel-inv+trk actions=output:2 table=1,priority=10,in_port=1,ct_state=+new+trk actions=ct(zone=5000),goto_table:2 table=2,priority=10,in_port=1,ct_state=-new+est-rel-inv+trk actions=output:2 This should now move the new connection to ESTABLISHED state in table 1 so that table 2 will hit and forward the icmp packet to port 2. The Ping should go through with userspace datapath. Darrell] This test does not make sense either. It does not even send the reply packet entering port 2 thru. conntrack, so it is not properly testing conntrack here. There are other problems with this test as well; pls check it. With the kernel datapath the icmp packets would still not pass. The problem is really that we today have entirely different conntrack semantics in kernel and userspace datapath. Since the kernel conntrack semantics are outside the scope of OVS we should take them as given, align the userspace conntrack accordingly, and update the OVS documentation to reflect the real semantics, i.e. that a connection only moves to established state when a conntrack lookup is done for a reply packet. [Darrell] Bringing userspace ESTABLISHED inline with the kernel ESTABLISHED is trivial; for 2.8/master, it is below (for 2.6 it is similar, but a few lines less) None of the 60 or so existing conntrack system tests are affected. However, I added a test and I’ll submit a patch once we resolve our discussion points. The OVS documented definition of ESTABLISHED is actually better, but I don’t think that is very important and I think most users will not care or even notice the difference. diff --git a/lib/conntrack-private.h b/lib/conntrack-private.h index ac0198f..1f6a107 100644 --- a/lib/conntrack-private.h +++ b/lib/conntrack-private.h @@ -107,6 +107,7 @@ struct conn { uint8_t seq_skew_dir; /* True if alg data connection. */ uint8_t alg_related; + uint8_t reply_seen; }; enum ct_update_res { diff --git a/lib/conntrack.c b/lib/conntrack.c index e555b55..69061fc 100644 --- a/lib/conntrack.c +++ b/lib/conntrack.c @@ -912,10 +912,13 @@ conn_update_state(struct conntrack *ct, struct dp_packet *pkt, switch (res) { case CT_UPDATE_VALID: - pkt->md.ct_state |= CS_ESTABLISHED; - pkt->md.ct_state &= ~CS_NEW; if (ctx->reply) { pkt->md.ct_state |= CS_REPLY_DIR; + (*conn)->reply_seen = true; + } + if ((*conn)->reply_seen) { + pkt->md.ct_state |= CS_ESTABLISHED; + pkt->md.ct_state &= ~CS_NEW; } break; case CT_UPDATE_INVALID: The commit itself does not change the conntrack state. It only persists the initially temporary conntrack entry in the database. We have created a simple downstream patch to align the userspace conntrack behavior to the kernel but we are not allowed publish it on the mailing list for IPR licensing reasons. The needed code changes are quite straightforward, though. BR, Jan From: Darrell Ball [mailto:db...@vmware.com] Sent: Thursday, 02 November, 2017 18:53 To: Jan Scheurich <jan.scheur...@ericsson.com<mailto:jan.scheur...@ericsson.com>>; Rohith Basavaraja <rohith.basavar...@ericsson.com<mailto:rohith.basavar...@ericsson.com>> Subject: Re: [ovs-discuss] Conntrack issue in OVS (2.6)+DPDK From: Jan Scheurich <jan.scheur...@ericsson.com<mailto:jan.scheur...@ericsson.com>> Date: Thursday, November 2, 2017 at 10:14 AM To: Rohith Basavaraja <rohith.basavar...@ericsson.com<mailto:rohith.basavar...@ericsson.com>> Cc: Darrel Ball <db...@vmware.com<mailto:db...@vmware.com>> Subject: RE: [ovs-discuss] Conntrack issue in OVS (2.6)+DPDK Hi Rohith, To illustrate that ct(commit) does not move a new connection to the established state as stated in [2], it should be enough to check that the initial icmp packet is dropped in table 2 of the simplistic conntrack pipeline: [Darrell] I am confused; the problem statement from Rohith was and is: “The userspace datapath flags a packet as ESTABLISHED ‘without’ a ct(commit)” ? table=0,priority=10,in_port=1,icmp actions=ct(table=1,zone=5000) table=0,priority=10,in_port=1,arp actions=output:2 table=0,priority=10,in_port=2 actions=output:1 table=1,priority=10,in_port=1,ct_state=-new+est-rel-inv+trk actions=output:2 table=1,priority=10,in_port=1,ct_state=+new+trk actions=ct(commit,zone=5000),goto_table:2 table=2,priority=10,in_port=1,ct_state=-new+est-rel-inv+trk actions=output:2 In addition we should dump the conntrack state after the initial packet has passed. The second icmp packet in the same direction should then move the packet to established state in table 0 and immediately send it to port 2 in table 1. In contrast, the behavior of the kernel datapath would be to drop all icmp packets sent from port 1 to port 2 as the return packet is never seen by conntrack. BR, Jan From: Darrell Ball [mailto:db...@vmware.com] Sent: Thursday, 02 November, 2017 17:31 To: Rohith Basavaraja <rohith.basavar...@ericsson.com<mailto:rohith.basavar...@ericsson.com>>; ovs-disc...@openvswitch.org<mailto:ovs-disc...@openvswitch.org> Cc: Jan Scheurich <jan.scheur...@ericsson.com<mailto:jan.scheur...@ericsson.com>> Subject: Re: [ovs-discuss] Conntrack issue in OVS (2.6)+DPDK From: <ovs-discuss-boun...@openvswitch.org<mailto:ovs-discuss-boun...@openvswitch.org>> on behalf of Rohith Basavaraja <rohith.basavar...@ericsson.com<mailto:rohith.basavar...@ericsson.com>> Date: Wednesday, November 1, 2017 at 10:28 PM To: "ovs-disc...@openvswitch.org<mailto:ovs-disc...@openvswitch.org>" <ovs-disc...@openvswitch.org<mailto:ovs-disc...@openvswitch.org>> Cc: Jan Scheurich <jan.scheur...@ericsson.com<mailto:jan.scheur...@ericsson.com>> Subject: Re: [ovs-discuss] Conntrack issue in OVS (2.6)+DPDK Hi, It’ been quite some time I raised this issue, thought will update the thread with our findings. Following is the summary of our findings and analysis, and we think OVS user datapath conntrack implementation Needs to be fixed otherwise some of the security group deployments mentioned below might fail. Analysis/Findings =============== Currently OVS kernel datapath implementation have the ct_state (conntrack state) semantics as described In the following document. http://www.iptables.info/en/connection-state.html<https://urldefense.proofpoint.com/v2/url?u=http-3A__www.iptables.info_en_connection-2Dstate.html&d=DwMGaQ&c=uilaK90D4TOVoH58JNXRgQ&r=BVhFA09CGX7JQ5Ih-uZnsw&m=cTQCN_hF1TN3jk_TJWYNMha6aUqSOURuVHEc5RCep1Y&s=yCJI8jfdp7_jUXkdLDee-j25N93R7kNawJUgmugy2-M&e=>[1] OVS user datapath doesn’t follow above semantics and also the ct_state description in the OVS specification (http://openvswitch.org/support/dist-docs/ovs-fields.7.pdf)<https://urldefense.proofpoint.com/v2/url?u=http-3A__openvswitch.org_support_dist-2Ddocs_ovs-2Dfields.7.pdf-29&d=DwMGaQ&c=uilaK90D4TOVoH58JNXRgQ&r=BVhFA09CGX7JQ5Ih-uZnsw&m=cTQCN_hF1TN3jk_TJWYNMha6aUqSOURuVHEc5RCep1Y&s=eGwS0b14Sw_uvAefFrhp3aKmzckD4UOt2Y0nwXsPPS8&e=>[2] is not correct as explained below. The main issue is when the conntrack state “CS_ESTABLISHED” is set for a tracked flow. In the kernel datapath and iptables a tracked flow moves to established state only once it sees a reply packet in the reverse direction. The user-space conntrack, in contrast, moves a tracked connection to established state as soon as a newly tracked connection is looked up the first time, irrespectively of the direction of the packet. [Darrell] are you sure ? The expectation is that the Userspace Datapath 2.6 behavior adheres to the OVS specification below. Please provide a test case that shows this is not the case; I would be interested ? Finally, OVS specification [2] defines the “est” state as “est (0x02) Part of an existing connection. Set to 1 if this is a committed connection”. This means that the tracked connection would move to established state when the ct(commit) action is executed and the semantics don’t match either the kernel or user-space behaviour. Because of the above difference some of the Security Group(SGs) use cases are failing for eg: VMs that have SGs that shall not allow communication among them are not working when VMs are on the same compute node. [Darrell] We had a lengthy offline email discussion about this from 8/23 to 8/30. The last few exchanges are below. /////////////////////////////////////////////// From: Rohith Basavaraja <rohith.basavar...@ericsson.com<mailto:rohith.basavar...@ericsson.com>> Date: Wednesday, August 30, 2017 at 9:05 AM To: Darrel Ball <db...@vmware.com<mailto:db...@vmware.com>> Subject: Re: [ovs-discuss] Conntrack issue in OVS (2.6)+DPDK Hi Darrell, Thanks, a lot for the help and sharing the useful information. Thanks Rohith From: Darrell Ball <db...@vmware.com<mailto:db...@vmware.com>> Date: Wednesday, 30 August 2017 at 9:18 PM To: Rohith Basavaraja <rohith.basavar...@ericsson.com<mailto:rohith.basavar...@ericsson.com>> Subject: Re: [ovs-discuss] Conntrack issue in OVS (2.6)+DPDK Hi Rohith From: Rohith Basavaraja <rohith.basavar...@ericsson.com<mailto:rohith.basavar...@ericsson.com>> Date: Wednesday, August 30, 2017 at 3:46 AM To: Darrel Ball <db...@vmware.com<mailto:db...@vmware.com>> Subject: Re: [ovs-discuss] Conntrack issue in OVS (2.6)+DPDK Hi Darrell, For user datapath do we have any other tools to dump conntrack entries Other than *ovs-appctl dpctl/dump-conntrack* [Darrell] Right now, we have " dump-conntrack [DP] [zone=ZONE] " \ "display conntrack entries for ZONE\n" " flush-conntrack [DP] [zone=ZONE] " \ "delete all conntrack entries in ZONE\n" " ct-stats-show [DP] [zone=ZONE] [verbose] " \ "CT connections grouped by protocol\n" " ct-bkts [DP] [gt=N] display connections per CT bucket\n" This is from ./utilities/ovs-dpctl.c For kernel datapath I see that we can use conntrack –L to dump the entries, Is conntrack tool is only for kernel datapath only? [Darrell] Yes In general, any other conntrack commands or tools available for userdatapath? [Darrell] Right now, the ones mentioned above Of course, the well known commands also tell lots about what is happening in conntrack indirectly sudo ovs-ofctl dump-flows br0 sudo ovs-appctl dpif/dump-flows br0 Sorry for too many queries, Pl let me know if it’s bothering you. [Darrell] No problem at all. Thanks Rohith From: Rohith Basavaraja <rohith.basavar...@ericsson.com<mailto:rohith.basavar...@ericsson.com>> Date: Wednesday, 30 August 2017 at 1:57 PM To: Darrell Ball <db...@vmware.com<mailto:db...@vmware.com>> Subject: Re: [ovs-discuss] Conntrack issue in OVS (2.6)+DPDK Hi Darrell, Thanks for the suggestions. Thanks Rohith From: Darrell Ball <db...@vmware.com<mailto:db...@vmware.com>> Date: Wednesday, 30 August 2017 at 12:17 PM To: Rohith Basavaraja <rohith.basavar...@ericsson.com<mailto:rohith.basavar...@ericsson.com>> Subject: Re: [ovs-discuss] Conntrack issue in OVS (2.6)+DPDK Ho Rohith So the previous answer is the solution then; elaborating: You want a committed connection VM2 -> VM1 (i.e. originated from VM2); this allows VM1 to send replies to VM2. You want to prevent creating a committed connection from VM1 -> VM2 This can be done in various ways by using in_port, zones (per logical ports), dl_src, dl_dst etc So traffic originated from VM1 -> VM2 will always be new Thanks Darrell From: Rohith Basavaraja <rohith.basavar...@ericsson.com<mailto:rohith.basavar...@ericsson.com>> Date: Tuesday, August 29, 2017 at 11:19 PM To: Darrel Ball <db...@vmware.com<mailto:db...@vmware.com>> Subject: Re: [ovs-discuss] Conntrack issue in OVS (2.6)+DPDK Hi Darrell, Just to clarify following is the usecase. 1. VM1 can originate/initiate traffic to VM2 2. VM1 can receive traffic from VM2 3. VM2 should not receive any new connection from VM1 4. VM2 can originate/initiate traffic to VM1 Thanks Rohith From: Darrell Ball <db...@vmware.com<mailto:db...@vmware.com>> Date: Wednesday, 30 August 2017 at 11:37 AM To: Rohith Basavaraja <rohith.basavar...@ericsson.com<mailto:rohith.basavar...@ericsson.com>> Subject: Re: [ovs-discuss] Conntrack issue in OVS (2.6)+DPDK Hi Rohith Just to confirm: 1/ VM1 can never send traffic to VM2 (originate or reply) ? OR 2/ VM1 cannot originate traffic to VM2 but VM1 can send reply traffic to VM2. ? I have now been assuming ‘2’ ? Thanks Darrell /////////////////////////////////////////////// Thanks Rohith From: Rohith Basavaraja <rohith.basavar...@ericsson.com<mailto:rohith.basavar...@ericsson.com>> Date: Thursday, 24 August 2017 at 12:43 AM To: Darrell Ball <db...@vmware.com<mailto:db...@vmware.com>>, "ovs-disc...@openvswitch.org<mailto:ovs-disc...@openvswitch.org>" <ovs-disc...@openvswitch.org<mailto:ovs-disc...@openvswitch.org>> Subject: Re: [ovs-discuss] Conntrack issue in OVS (2.6)+DPDK Hi Darrell, Yes the expected outcome is to drop new or non related connection, and allow only related or established connections. Just for the clarity adding the details of the topology and pipeline rules. Description about the topology ========================= VM1 and VM4 VMs are on same compute node but with different SGs. For VM4, security rules configured are as below: Egress/Ingress Allow all For VM1, Egress Allow all Ingress Allow only from VMs which are in same security group. For above combination, all conntrack flows required (in tables 213, 214 on VM egress side and 243, 244) are properly programmed in the OVS. For traffic sent from VM4 to VM1 , conntrack is allowing traffic which should have been dropped as the ingress for later is to be allowed only from the VMs of the same SG. For VM1 , conntrack is directly sending traffic to "ct_state==-new+est-rel-inv+trk " flow by-passing "ct_state=+new+trk" flow in the ingress direction. Following is the pipe line rules details ============================== VM4 is on 112/15: (dpdkvhostuser: configured_rx_queues=1, configured_tx_queues=1, mtu=2140, requested_rx_queues=1, requested_tx_queues=1) VM1 is on 108/11: (dpdkvhostuser: configured_rx_queues=1, configured_tx_queues=1, mtu=2140, requested_rx_queues=1, requested_tx_queues=1) VM4 IP: 172.20.1.113 MAC: fa:16:3e:55:9e:33 VM1 IP: 172.20.1.117 MAC : fa:16:3e:f7:72:d3 I am doing Ping from VM4 (172.20.1.113 ) to VM1 (172.20.1.117). cookie=0x8000000, duration=2809.367s, table=0, n_packets=74, n_bytes=10426, priority=4,in_port=112,vlan_tci=0x0000/0x1fff actions=write_metadata:0x19f40000000000/0xffffff0000000001,goto_table:17 cookie=0x6900000, duration=2809.343s, table=17, n_packets=74, n_bytes=10426, priority=10,metadata=0x19f40000000000/0xffffff0000000000 actions=write_metadata:0x4019f40000000000/0xfffffffffffffffe,goto_table:211 cookie=0x6900000, duration=2809.313s, table=211, n_packets=54, n_bytes=8674, priority=61010,ip,metadata=0x19f40000000000/0x1fffff0000000000,dl_src=fa:16:3e:55:9e:33,nw_src=172.20.1.113 actions=goto_table:212 cookie=0x6900000, duration=15546.529s, table=212, n_packets=3669, n_bytes=361562, priority=61010,icmp actions=goto_table:213 cookie=0x6900000, duration=2809.308s, table=213, n_packets=54, n_bytes=8674, priority=61010,ip,metadata=0x19f40000000000/0x1fffff0000000000 actions=ct(table=214,zone=5021) cookie=0x6900000, duration=15546.544s, table=214, n_packets=3660, n_bytes=367508, priority=62020,ct_state=-new+est-rel-inv+trk actions=resubmit(,17) cookie=0x6900001, duration=2809.304s, table=214, n_packets=2, n_bytes=180, priority=62015,ct_state=+inv+trk,metadata=0x19f40000000000/0x1fffff0000000000 actions=drop cookie=0x6900000, duration=2809.295s, table=214, n_packets=10, n_bytes=908, priority=1000,ct_state=+new+trk,ip,metadata=0x19f40000000000/0x1fffff0000000000 actions=ct(commit,zone=5021),resubmit(,17) cookie=0x6800001, duration=2807.340s, table=17, n_packets=72, n_bytes=10246, priority=10,metadata=0x4019f40000000000/0xffffff0000000000 actions=write_metadata:0xc019f40000000000/0xfffffffffffffffe,goto_table:60 cookie=0x6800000, duration=15546.596s, table=60, n_packets=262265, n_bytes=19604926, priority=0 actions=resubmit(,17) cookie=0x8040000, duration=2807.338s, table=17, n_packets=70, n_bytes=9562, priority=10,metadata=0xc019f40000000000/0xffffff0000000000 actions=write_metadata:0xe019f4139d000000/0xfffffffffffffffe,goto_table:48 cookie=0x8500000, duration=15546.528s, table=48, n_packets=302458, n_bytes=34190652, priority=0 actions=resubmit(,49),resubmit(,50) cookie=0x805139d, duration=2808.378s, table=50, n_packets=70, n_bytes=9562, priority=20,metadata=0x19f4139d000000/0x1fffffffff000000,dl_src=fa:16:3e:55:9e:33 actions=goto_table:51 cookie=0x803139d, duration=2818.232s, table=51, n_packets=34, n_bytes=4613, priority=20,metadata=0x139d000000/0xffff000000,dl_dst=fa:16:3e:f7:72:d3 actions=load:0x1a5300->NXM_NX_REG6[],resubmit(,220) cookie=0x6900000, duration=2819.193s, table=220, n_packets=3455, n_bytes=207541, priority=6,reg6=0x1a5300 actions=load:0xe01a5300->NXM_NX_REG6[],write_metadata:0xe01a530000000000/0xfffffffffffffffe,goto_table:241 cookie=0x6900000, duration=2819.237s, table=241, n_packets=32, n_bytes=4851, priority=61010,ip,metadata=0x1a530000000000/0x1fffff0000000000,dl_dst=fa:16:3e:f7:72:d3,nw_dst=172.20.1.117 actions=goto_table:242 cookie=0x6900000, duration=15546.579s, table=242, n_packets=3738, n_bytes=368372, priority=61010,icmp actions=goto_table:243 cookie=0x6900000, duration=2819.235s, table=243, n_packets=32, n_bytes=4851, priority=61010,ip,metadata=0x1a530000000000/0x1fffff0000000000 actions=ct(table=244,zone=5021) cookie=0x6900001, duration=2819.230s, table=244, n_packets=2, n_bytes=196, priority=50,ct_state=+new+trk,metadata=0x1a530000000000/0x1fffff0000000000 actions=drop cookie=0x6900000, duration=15546.577s, table=244, n_packets=0, n_bytes=0, priority=62020,ct_state=-new-est+rel-inv+trk actions=resubmit(,220) cookie=0x6900000, duration=15546.552s, table=244, n_packets=3819, n_bytes=431050, priority=62020,ct_state=-new+est-rel-inv+trk actions=resubmit(,220) cookie=0x8000007, duration=2819.193s, table=220, n_packets=107, n_bytes=9431, priority=7,reg6=0xe01a5300 actions=output:108 Thanks Rohith From: Darrell Ball <db...@vmware.com<mailto:db...@vmware.com>> Date: Thursday, 24 August 2017 at 12:20 AM To: Rohith Basavaraja <rohith.basavar...@ericsson.com<mailto:rohith.basavar...@ericsson.com>>, "ovs-disc...@openvswitch.org<mailto:ovs-disc...@openvswitch.org>" <ovs-disc...@openvswitch.org<mailto:ovs-disc...@openvswitch.org>> Subject: Re: [ovs-discuss] Conntrack issue in OVS (2.6)+DPDK Hi Rohith I might have missed the alias earlier. From the below o/p, I see the rule cookie=0x6900000, duration=15546.577s, table=244, n_packets=0, n_bytes=0, priority=62020,ct_state=-new-est+rel-inv+trk actions=resubmit(,220) not being hit. I also see the rule cookie=0x6900001, duration=2819.230s, table=244, n_packets=2, n_bytes=196, priority=50, ct_state=+new+trk,metadata=0x1a530000000000/0x1fffff0000000000 actions=drop having a drop action. What is the expectation of the test ? Is table 244 intended to drop non-related and non-established packets ? Thanks Darrell From: <ovs-discuss-boun...@openvswitch.org<mailto:ovs-discuss-boun...@openvswitch.org>> on behalf of Rohith Basavaraja <rohith.basavar...@ericsson.com<mailto:rohith.basavar...@ericsson.com>> Date: Wednesday, August 23, 2017 at 3:03 AM To: "ovs-disc...@openvswitch.org<mailto:ovs-disc...@openvswitch.org>" <ovs-disc...@openvswitch.org<mailto:ovs-disc...@openvswitch.org>> Subject: [ovs-discuss] Conntrack issue in OVS (2.6)+DPDK Hi, I see that if I have following rules, i.e not allow any new connections and allow only established and related flows, cookie=0x6900001, duration=2819.230s, table=244, n_packets=2, n_bytes=196, priority=50, ct_state=+new+trk,metadata=0x1a530000000000/0x1fffff0000000000 actions=drop cookie=0x6900000, duration=15546.577s, table=244, n_packets=0, n_bytes=0, priority=62020,ct_state=-new-est+rel-inv+trk actions=resubmit(,220) cookie=0x6900000, duration=15546.552s, table=244, n_packets=3819, n_bytes=431050, priority=62020,ct_state=-new+est-rel-inv+trk actions=resubmit(,220) We are still seeing that new connections are getting allowed, we see this behavior/issue only OVS + DPDK and not in OVS kernel mode. Wanted to check if this issue is already reported elsewhere or it’s new issue. Thanks Rohith _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev