[ovs-dev] [PATCH] lib: fix cmap_find_protected
cmap_find_protected calculated wrong h2 hash which causing entries with duplicated id inserted into the cmap. Reported-at: https://mail.openvswitch.org/pipermail/ovs-discuss/2018-December/047945.html Signed-off-by: Zang MingJie --- lib/cmap.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/lib/cmap.c b/lib/cmap.c index cb9cd32ab..c9eef3f4a 100644 --- a/lib/cmap.c +++ b/lib/cmap.c @@ -568,7 +568,7 @@ cmap_find_protected(const struct cmap *cmap, uint32_t hash) { struct cmap_impl *impl = cmap_get_impl(cmap); uint32_t h1 = rehash(impl, hash); -uint32_t h2 = other_hash(hash); +uint32_t h2 = other_hash(h1); struct cmap_node *node; node = cmap_find_bucket_protected(impl, hash, h1); -- 2.20.1 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH] lib: fix cmap_find_protected
We have found a bug where recirc node with duplicate id has been inserted to the cmap, details of the bug in the following email: https://mail.openvswitch.org/pipermail/ovs-discuss/2018-December/047945.html Finally we found that cmap_find_protected calculated wrong h2: struct cmap_impl *impl = cmap_get_impl(cmap); uint32_t h1 = rehash(impl, hash); uint32_t h2 = other_hash(hash); Here is how h2 is calculated in other functions: const struct cmap_impl *impl = cmap_get_impl(cmap); uint32_t h1 = rehash(impl, hash); uint32_t h2 = other_hash(h1); On Mon, Dec 24, 2018 at 4:25 PM Zang MingJie wrote: > --- > lib/cmap.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/lib/cmap.c b/lib/cmap.c > index cb9cd32ab..c9eef3f4a 100644 > --- a/lib/cmap.c > +++ b/lib/cmap.c > @@ -568,7 +568,7 @@ cmap_find_protected(const struct cmap *cmap, uint32_t > hash) > { > struct cmap_impl *impl = cmap_get_impl(cmap); > uint32_t h1 = rehash(impl, hash); > -uint32_t h2 = other_hash(hash); > +uint32_t h2 = other_hash(h1); > struct cmap_node *node; > > node = cmap_find_bucket_protected(impl, hash, h1); > -- > 2.20.1 > > ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH] lib: fix cmap_find_protected
--- lib/cmap.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/lib/cmap.c b/lib/cmap.c index cb9cd32ab..c9eef3f4a 100644 --- a/lib/cmap.c +++ b/lib/cmap.c @@ -568,7 +568,7 @@ cmap_find_protected(const struct cmap *cmap, uint32_t hash) { struct cmap_impl *impl = cmap_get_impl(cmap); uint32_t h1 = rehash(impl, hash); -uint32_t h2 = other_hash(hash); +uint32_t h2 = other_hash(h1); struct cmap_node *node; node = cmap_find_bucket_protected(impl, hash, h1); -- 2.20.1 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] Bug: select group with dp_hash causing recursive recirculation
Hi, we found a serious problem where one pmd is stop working, I want to share the problem and find solution here. vswitchd log: 2018-09-13T23:36:44.377Z|40269235|dpif_netdev(pmd45)|WARN|Packet dropped. Max recirculation depth exceeded. 2018-09-13T23:36:44.387Z|40269236|dpif_netdev(pmd45)|WARN|Packet dropped. Max recirculation depth exceeded. 2018-09-13T23:36:44.391Z|40269237|dpif_netdev(pmd45)|WARN|Packet dropped. Max recirculation depth exceeded. problematic datapath flows: ct_state(+new-est),recirc_id(0x143c893),in_port(2),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(proto=6,frag=no),tcp(dst=443), packets:84573093, bytes:6308807903, used:0.009s, flags:SFPRU.ECN[200][400][800], actions:meter(306),hash(hash_l4(0)),recirc(0x237b09d) recirc_id(0x237b09d),in_port(2),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(frag=no), packets:279713339, bytes:20890205186, used:0.007s, flags:SFPRU.ECN[200][400][800], actions:hash(hash_l4(0)),recirc(0x237b09d) corresponding openflow: cookie=0x5b5ab65e000f0101, duration=4848269.642s, table=40, n_packets=974343805, n_bytes=72484367083, priority=10,tcp,metadata=0xf0100/0xff00,tp_dst=443 actions=group:983297 group_id=983297,type=select,selection_method=dp_hash,bucket=bucket_id:3057033848,weight:100,actions=ct(commit,table=70,zone=15,exec(nat(dst=10.177.251.203:443))),...``lots of buckets``... Following explains how select group with dp_hash works. To implement select group with dp_hash, two datapath flows are needed: 1. calculate dp_hash, recirculate to second one 2. select group bucket by dp_hash When encounter a datapath miss, openflow doesn't know which one is missing, so it depends on dp_hash value of the packet: if dp_hash == 0 generate first dp flow. if dp_hash != 0 generate second dp flow. Back to the problem. Notice that second datapath flow is a dead loop, it recirculate to itself. The cause of the problem is here ofproto/ofproto-dpif-xlate.c#L4429[1]: /* dp_hash value 0 is special since it means that the dp_hash has not been * computed, as all computed dp_hash values are non-zero. Therefore * compare to zero can be used to decide if the dp_hash value is valid * without masking the dp_hash field. */ if (!dp_hash) { The comment saying that `dp_hash` shouldn't be zero, but under DPDK, it can be zero, at lib/odp-execute.c#L747[2] /* RSS hash can be used here instead of 5tuple for * performance reasons. */ if (dp_packet_rss_valid(packet)) { hash = dp_packet_get_rss_hash(packet); hash = hash_int(hash, hash_act->hash_basis); } else { flow_extract(packet, ); hash = flow_hash_5tuple(, hash_act->hash_basis); } packet->md.dp_hash = hash; I don't know how small chance that `hash_int` returns 0, we have tested that if the final hash is 0, will definitely trigger the same bug. And due to the chance is extremely low, I'm also investigation that if there are other situation that will pass 0 hash to ofp. IMO, it is silly to depends on dp_hash value, maybe we need a new mechanism which can pass data between ofp and odp freely. And a quick solution could be just change the 0 hash to 1. [1] https://github.com/openvswitch/ovs/blob/master/ofproto/ofproto-dpif-xlate.c#L4429 [2] https://github.com/openvswitch/ovs/blob/master/lib/odp-execute.c#L747 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH] conntrack: invalid packet should not modify ct state
On Thu, Sep 20, 2018 at 8:47 AM Darrell Ball wrote: > > > On Fri, Sep 14, 2018 at 1:46 AM, Zang MingJie > wrote: > >> >> > Did you notice this check ? >> >> > >> >> > if (src->state < CT_DPIF_TCPS_SYN_SENT) { >> >> > /* First packet from this end. Set its state */ >> >> >> >> Yes, this is exactly where we found the problem. If first reply packet >> >> is invalid, it masses all following packets. >> > >> > >> > >> > Based on your description in the commit message: >> > " Currently an invalid packet can change the seq number while the >> connection is in SEQ_RECV state." >> > we don't fall into this condition since SEQ_RECV > >> CT_DPIF_TCPS_SYN_SENT, right ? >> > >> > If you did not mean "SEQ_RECV", maybe you meant something else ? >> > What the value of "src->state" where you saw an issue ? >> >> SYN_RECV is our server side tcp state, >> > > Thanks for clarifying what you referring to. > > > >> ct table track either side tcp state separately, the problem happens in >> this situation: >> >>client | ct.src ct.dst | server >> packet: syn -> >> state :SYN_SEND | SYN_SEND CLOSED | SYN_RECV >> packet: <- malformed packet >> state: SYN_SEND | SYN_SEND SYN_SEND | SYN_RECV <-updated to wrong >> state >> packet: <- syn+ack <-invalid >> > > That's all fine, but details are needed. > > Is the second packet crafted to be invalid ? > For that matter, is the first packet crafted to be bogus as well ? > > Pls send along: > > 1/ First packet tcp fields > first packet is normal tcp syn packet, nothing special > 2/ Second packet tcp fields > second invalid packet is triggered due to a previous kernel bug, in general it is packet of previous connection with same 5-tuple, everything is wrong including flags, seq and ack number. the actual value doesn't matter as long as it is invalid, as I have already shown, the CT state of this peer will go to SYN_SEND even that no syn flag was set, and the wrong seq number in the invalid packet will be tracked. > 3/ The tcp lengths if the response is a crafted packet with unexpected > data. > it doesn't matter at all. In the situation where we found the bug, it's a fin packet of the last connection, no data. The major problem is following checking you have posted: if (src->state < CT_DPIF_TCPS_SYN_SENT) { It is too loose, and if meet, the peer state will be changed, it is unexpected. > > Thanks Darrell > > >> >> malformed packet will set wrong seq_lo and seq_hi to the state, >> preventing following packets pass ct. >> >> > >> >> >> >> >> >> > >> >> > >> >> > >> >> >> >> >> >> >> >> >> Signed-off-by: Zang MingJie >> >> >> --- >> >> >> lib/conntrack-tcp.c | 10 -- >> >> >> 1 file changed, 8 insertions(+), 2 deletions(-) >> >> >> >> >> >> diff --git a/lib/conntrack-tcp.c b/lib/conntrack-tcp.c >> >> >> index 86d313d28..7443a3293 100644 >> >> >> --- a/lib/conntrack-tcp.c >> >> >> +++ b/lib/conntrack-tcp.c >> >> >> @@ -151,9 +151,9 @@ tcp_conn_update(struct conn *conn_, struct >> conntrack_bucket *ctb, >> >> >> struct conn_tcp *conn = conn_tcp_cast(conn_); >> >> >> struct tcp_header *tcp = dp_packet_l4(pkt); >> >> >> /* The peer that sent 'pkt' */ >> >> >> -struct tcp_peer *src = >peer[reply ? 1 : 0]; >> >> >> +struct tcp_peer orig_src, *src = >peer[reply ? 1 : 0]; >> >> >> /* The peer that should receive 'pkt' */ >> >> >> -struct tcp_peer *dst = >peer[reply ? 0 : 1]; >> >> >> +struct tcp_peer orig_dst, *dst = >peer[reply ? 0 : 1]; >> >> >> uint8_t sws = 0, dws = 0; >> >> >> uint16_t tcp_flags = TCP_FLAGS(tcp->tcp_ctl); >> >> >> >> >> >> @@ -187,6 +187,10 @@ tcp_conn_update(struct conn *conn_, struct >> conntrack_bucket *ctb, >> >> >> dws = TCP_MAX_WSCALE; >> >> >> } >> >> >> >> >> >> + >> >> >> +orig_src = *src; >> >> >> +orig_dst = *dst; >> >> >> + >> >> >> /* >> >> >> * Sequence tracking algorithm from Guido van Rooij's paper: >> >> >> * http://www.madison-gurkha.com/publications/tcp_filtering/ >> >> >> @@ -385,6 +389,8 @@ tcp_conn_update(struct conn *conn_, struct >> conntrack_bucket *ctb, >> >> >> src->state = dst->state = CT_DPIF_TCPS_TIME_WAIT; >> >> >> } >> >> >> } else { >> >> >> +*src = orig_src; >> >> >> +*dst = orig_dst; >> >> >> return CT_UPDATE_INVALID; >> >> >> } >> >> >> >> >> >> -- >> >> >> 2.19.0.rc1 >> >> >> >> >> >> ___ >> >> >> dev mailing list >> >> >> d...@openvswitch.org >> >> >> https://mail.openvswitch.org/mailman/listinfo/ovs-dev >> >> > >> >> > >> > >> > >> > > ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH] conntrack: invalid packet should not modify ct state
>> > Did you notice this check ? >> > >> > if (src->state < CT_DPIF_TCPS_SYN_SENT) { >> > /* First packet from this end. Set its state */ >> >> Yes, this is exactly where we found the problem. If first reply packet >> is invalid, it masses all following packets. > > > > Based on your description in the commit message: > " Currently an invalid packet can change the seq number while the connection is in SEQ_RECV state." > we don't fall into this condition since SEQ_RECV > CT_DPIF_TCPS_SYN_SENT, right ? > > If you did not mean "SEQ_RECV", maybe you meant something else ? > What the value of "src->state" where you saw an issue ? SYN_RECV is our server side tcp state, ct table track either side tcp state separately, the problem happens in this situation: client | ct.src ct.dst | server packet: syn -> state :SYN_SEND | SYN_SEND CLOSED | SYN_RECV packet: <- malformed packet state: SYN_SEND | SYN_SEND SYN_SEND | SYN_RECV <-updated to wrong state packet: <- syn+ack <-invalid malformed packet will set wrong seq_lo and seq_hi to the state, preventing following packets pass ct. > >> >> >> > >> > >> > >> >> >> >> >> >> Signed-off-by: Zang MingJie >> >> --- >> >> lib/conntrack-tcp.c | 10 -- >> >> 1 file changed, 8 insertions(+), 2 deletions(-) >> >> >> >> diff --git a/lib/conntrack-tcp.c b/lib/conntrack-tcp.c >> >> index 86d313d28..7443a3293 100644 >> >> --- a/lib/conntrack-tcp.c >> >> +++ b/lib/conntrack-tcp.c >> >> @@ -151,9 +151,9 @@ tcp_conn_update(struct conn *conn_, struct conntrack_bucket *ctb, >> >> struct conn_tcp *conn = conn_tcp_cast(conn_); >> >> struct tcp_header *tcp = dp_packet_l4(pkt); >> >> /* The peer that sent 'pkt' */ >> >> -struct tcp_peer *src = >peer[reply ? 1 : 0]; >> >> +struct tcp_peer orig_src, *src = >peer[reply ? 1 : 0]; >> >> /* The peer that should receive 'pkt' */ >> >> -struct tcp_peer *dst = >peer[reply ? 0 : 1]; >> >> +struct tcp_peer orig_dst, *dst = >peer[reply ? 0 : 1]; >> >> uint8_t sws = 0, dws = 0; >> >> uint16_t tcp_flags = TCP_FLAGS(tcp->tcp_ctl); >> >> >> >> @@ -187,6 +187,10 @@ tcp_conn_update(struct conn *conn_, struct conntrack_bucket *ctb, >> >> dws = TCP_MAX_WSCALE; >> >> } >> >> >> >> + >> >> +orig_src = *src; >> >> +orig_dst = *dst; >> >> + >> >> /* >> >> * Sequence tracking algorithm from Guido van Rooij's paper: >> >> * http://www.madison-gurkha.com/publications/tcp_filtering/ >> >> @@ -385,6 +389,8 @@ tcp_conn_update(struct conn *conn_, struct conntrack_bucket *ctb, >> >> src->state = dst->state = CT_DPIF_TCPS_TIME_WAIT; >> >> } >> >> } else { >> >> +*src = orig_src; >> >> +*dst = orig_dst; >> >> return CT_UPDATE_INVALID; >> >> } >> >> >> >> -- >> >> 2.19.0.rc1 >> >> >> >> ___ >> >> dev mailing list >> >> d...@openvswitch.org >> >> https://mail.openvswitch.org/mailman/listinfo/ovs-dev >> > >> > > > ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH] conntrack: invalid packet should not modify ct state
On Thu, Sep 13, 2018 at 2:55 AM Darrell Ball wrote: > > Thanks for looking MingJie > > > On Wed, Sep 12, 2018 at 2:16 AM, Zang MingJie wrote: >> >> When encounter an invalid packet, all changes made by the packet should >> be reverted. Currently an invalid packet can change the seq number while >> the connection is in SEQ_RECV state. > > > > Did you notice this check ? > > if (src->state < CT_DPIF_TCPS_SYN_SENT) { > /* First packet from this end. Set its state */ Yes, this is exactly where we found the problem. If first reply packet is invalid, it masses all following packets. > > > >> >> >> Signed-off-by: Zang MingJie >> --- >> lib/conntrack-tcp.c | 10 -- >> 1 file changed, 8 insertions(+), 2 deletions(-) >> >> diff --git a/lib/conntrack-tcp.c b/lib/conntrack-tcp.c >> index 86d313d28..7443a3293 100644 >> --- a/lib/conntrack-tcp.c >> +++ b/lib/conntrack-tcp.c >> @@ -151,9 +151,9 @@ tcp_conn_update(struct conn *conn_, struct >> conntrack_bucket *ctb, >> struct conn_tcp *conn = conn_tcp_cast(conn_); >> struct tcp_header *tcp = dp_packet_l4(pkt); >> /* The peer that sent 'pkt' */ >> -struct tcp_peer *src = >peer[reply ? 1 : 0]; >> +struct tcp_peer orig_src, *src = >peer[reply ? 1 : 0]; >> /* The peer that should receive 'pkt' */ >> -struct tcp_peer *dst = >peer[reply ? 0 : 1]; >> +struct tcp_peer orig_dst, *dst = >peer[reply ? 0 : 1]; >> uint8_t sws = 0, dws = 0; >> uint16_t tcp_flags = TCP_FLAGS(tcp->tcp_ctl); >> >> @@ -187,6 +187,10 @@ tcp_conn_update(struct conn *conn_, struct >> conntrack_bucket *ctb, >> dws = TCP_MAX_WSCALE; >> } >> >> + >> +orig_src = *src; >> +orig_dst = *dst; >> + >> /* >> * Sequence tracking algorithm from Guido van Rooij's paper: >> * http://www.madison-gurkha.com/publications/tcp_filtering/ >> @@ -385,6 +389,8 @@ tcp_conn_update(struct conn *conn_, struct >> conntrack_bucket *ctb, >> src->state = dst->state = CT_DPIF_TCPS_TIME_WAIT; >> } >> } else { >> +*src = orig_src; >> +*dst = orig_dst; >> return CT_UPDATE_INVALID; >> } >> >> -- >> 2.19.0.rc1 >> >> ___ >> dev mailing list >> d...@openvswitch.org >> https://mail.openvswitch.org/mailman/listinfo/ovs-dev > > ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH] conntrack: invalid packet should not modify ct state
When encounter an invalid packet, all changes made by the packet should be reverted. Currently an invalid packet can change the seq number while the connection is in SEQ_RECV state. Signed-off-by: Zang MingJie --- lib/conntrack-tcp.c | 10 -- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/lib/conntrack-tcp.c b/lib/conntrack-tcp.c index 86d313d28..7443a3293 100644 --- a/lib/conntrack-tcp.c +++ b/lib/conntrack-tcp.c @@ -151,9 +151,9 @@ tcp_conn_update(struct conn *conn_, struct conntrack_bucket *ctb, struct conn_tcp *conn = conn_tcp_cast(conn_); struct tcp_header *tcp = dp_packet_l4(pkt); /* The peer that sent 'pkt' */ -struct tcp_peer *src = >peer[reply ? 1 : 0]; +struct tcp_peer orig_src, *src = >peer[reply ? 1 : 0]; /* The peer that should receive 'pkt' */ -struct tcp_peer *dst = >peer[reply ? 0 : 1]; +struct tcp_peer orig_dst, *dst = >peer[reply ? 0 : 1]; uint8_t sws = 0, dws = 0; uint16_t tcp_flags = TCP_FLAGS(tcp->tcp_ctl); @@ -187,6 +187,10 @@ tcp_conn_update(struct conn *conn_, struct conntrack_bucket *ctb, dws = TCP_MAX_WSCALE; } + +orig_src = *src; +orig_dst = *dst; + /* * Sequence tracking algorithm from Guido van Rooij's paper: * http://www.madison-gurkha.com/publications/tcp_filtering/ @@ -385,6 +389,8 @@ tcp_conn_update(struct conn *conn_, struct conntrack_bucket *ctb, src->state = dst->state = CT_DPIF_TCPS_TIME_WAIT; } } else { +*src = orig_src; +*dst = orig_dst; return CT_UPDATE_INVALID; } -- 2.19.0.rc1 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] OVS-DPDK: fast host internal interface
Hi: Currently with user space datapath, only internal tap device can be used to communicate with kernel. Because packets must be copied from user space to kernel, the throughput of tap implementation is very slow, not suitable for data plane. Now DPDK provides some alternative solutions. 1. KNI interface. https://doc.dpdk.org/guides/nics/kni.html It requires a kernel module, and seems not well maintained. 2. Tap/Tun PMD. https://doc.dpdk.org/guides/nics/tap.html Instead of using ovs main thread to communicate with tap device, it use its own pmd thread, which may increase performance. 3. vhost+virtio adapter. https://doc.dpdk.org/guides/howto/virtio_user_as_exceptional_path.html It utilizes existing vhost and vhost-net kernel module, emulate a virtio device, communicate directly to vhost ring. This looks like a good solution. Looking for suggestions ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [patch v2 1/2] conntrack: Fix nat_clean.
I don't think the patch will resolve the problem. Once ctb->lock is released, other thread may have chance to acquire the lock and modify ctb. In general, ctb->lock can not be released in this function, another approach is needed. On Wed, Aug 29, 2018 at 3:31 PM Darrell Ball wrote: > > nat_clean has a defunct optimization for calculating a hash outside the > scope of a bucket lock which can lead to a race in referencing a freed > conntrack entry. Adjust to avoid this. Needs backporting to 2.8. > > Reported-at: > https://mail.openvswitch.org/pipermail/ovs-dev/2018-August/351629.html > Fixes: 286de2729955 ("dpdk: Userspace Datapath: Introduce NAT Support.") > Signed-off-by: Darrell Ball > --- > lib/conntrack.c | 14 -- > 1 file changed, 8 insertions(+), 6 deletions(-) > > diff --git a/lib/conntrack.c b/lib/conntrack.c > index be8debb..692f2b8 100644 > --- a/lib/conntrack.c > +++ b/lib/conntrack.c > @@ -778,20 +778,22 @@ nat_clean(struct conntrack *ct, struct conn *conn, > { > ct_rwlock_wrlock(>resources_lock); > nat_conn_keys_remove(>nat_conn_keys, >rev_key, ct->hash_basis); > -ct_rwlock_unlock(>resources_lock); > -ct_lock_unlock(>lock); > unsigned bucket_rev_conn = > hash_to_bucket(conn_key_hash(>rev_key, ct->hash_basis)); > +struct conn_key rev_key = conn->rev_key; > +ct_rwlock_unlock(>resources_lock); > +ct_lock_unlock(>lock); > + > ct_lock_lock(>buckets[bucket_rev_conn].lock); > ct_rwlock_wrlock(>resources_lock); > long long now = time_msec(); > -struct conn *rev_conn = conn_lookup(ct, >rev_key, now); > +struct conn *rev_conn = conn_lookup(ct, _key, now); > struct nat_conn_key_node *nat_conn_key_node = > -nat_conn_keys_lookup(>nat_conn_keys, >rev_key, > +nat_conn_keys_lookup(>nat_conn_keys, _key, > ct->hash_basis); > > -/* In the unlikely event, rev conn was recreated, then skip > - * rev_conn cleanup. */ > +/* In the unlikely event, 'rev_conn' was recreated, then skip > + * 'rev_conn' cleanup. */ > if (rev_conn && (!nat_conn_key_node || > conn_key_cmp(_conn_key_node->value, >_conn->rev_key))) { > -- > 1.9.1 > > ___ > dev mailing list > d...@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-dev ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] Several conntrack problems, including some critical bugs.
While developing application using ovs userspace conntrack, we found some bugs worth mention here. 1. conntrack_clean may causes ovs crash. conntrack_clean function iterators through all buckets, and free entries in the bucket with bucket lock, but when releasing a NAT connection, inside nat_clean function, the bucket lock is temporarily released, if other PMD acquires the lock and modifies the bucket, further loop may causing invalid memory access inside sweep_bucket function. 2. occasionally incorrectly DNAT to 1024 port, despite whatever port specified. We found 2 scenarios, both leads to this result. First, consider there are two virtual server share the same backend, which are implemented by DNAT, both V1 and V2 are DNAT to R. While there is already a connection C->V1 which is DNAT as C->R, if there is another incoming connection C->V2, will also DNAT as C->R, causing conntrack table conflict, but instead of dropping the packet, the connection is DNAT to port 1024. Because the NAT function search through port 1024 - 65535 when conflict occurred. Second, if a conntrack entry is expired but not yet released, mostly in TIMEWAIT state, the client may reuse the same port to establish a new connection, when this condition is met, will also cause a conflict, the connection will DNAT to port 1024 if DNAT is used. There are also some other problems under investigation, and I'll post them when we find the cause. ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] Upper limit of QoS burst
Hi: I found that in struct ofputil_meter_band, rate and burst are all 32 bits, the max burst is: for bandwidth: 4GB = 32Gb for pps: 4G packets/1000 = 4M packets In our high performance setup, 4M packets burst is easily achievable, so I suggest change these variable to uint64_t ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [BUG] ovs-ofctl version 2.5.0 will crash with OFPFMFC_BAD_COMMAND
Confirmed, can be easily reproduced using described method. Using ovs 2.6.2 On 01/23/2017 11:58 PM, Vidyasagara Guntaka via dev wrote: Hi Ben, We could reproduce this with the latest version 2.6.1. When we compiled the code, we removed -O2 from CFLAGS. This seems to make it happen more frequently. With the following script running, the error starts happening within a few seconds and then continues to happen after every few seconds. In summary, our suspicion is that having no controller set and having no NORMAL processing flow seems to trigger the stack that was pointed out in the gdb session more often and hence we hit this race condition very easily using the following script. (Even if there is a default NORMAL processing flow entry, after the first iteration of the script below, that will be deleted). Also, a few things about the setup - just in case: * enp5s0 belongs to the physical interface on this hypervisor. * vport0 and vport1 belong to tap interfaces corresponding to two VMs running on this hypervisor. * When the script was running, we had pings issued from the VMs so that packets make it to the bridge br0. Here is a small script that makes it happen on multiple of our hypervisors: while true do ovs-ofctl add-flow br0 "priority=200,table=123,idle_timeout=1,in_port=1,actions=controller" ovs-ofctl add-flow br0 "priority=200,table=123,idle_timeout=1,in_port=2,actions=controller" ovs-ofctl add-flow br0 "priority=200,table=123,idle_timeout=1,in_port=3,actions=controller" ovs-ofctl add-flow br0 "priority=200,table=123,idle_timeout=1,in_port=4,actions=controller" ovs-ofctl del-flows br0 done Here is our bridge br0 setup: [root@deepspace ~]# ifconfig br0 br0: flags=4163mtu 1500 inet 192.168.2.142 netmask 255.255.255.0 broadcast 192.168.2.255 inet6 fe80::213:3bff:fe0f:1301 prefixlen 64 scopeid 0x20 ether 00:13:3b:0f:13:01 txqueuelen 1000 (Ethernet) RX packets 89417814 bytes 12088012200 (11.2 GiB) RX errors 0 dropped 82 overruns 0 frame 0 TX packets 32330647 bytes 3168352394 (2.9 GiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 [root@deepspace ~]# ovs-vsctl show 54f89e00-edd2-486e-9626-6d11c7d8b0b6 Bridge "br0" Port "vport1" Interface "vport1" Port "br0" Interface "br0" type: internal Port vtep Interface vtep type: vxlan options: {key=flow, remote_ip="192.168.1.141"} Port "vport0" Interface "vport0" Port "enp5s0" Interface "enp5s0" ovs_version: “2.6.1" [root@deepspace ~]# ovs-ofctl show br0 OFPT_FEATURES_REPLY (xid=0x2): dpid:00133b0f1301 n_tables:254, n_buffers:256 capabilities: FLOW_STATS TABLE_STATS PORT_STATS QUEUE_STATS ARP_MATCH_IP actions: output enqueue set_vlan_vid set_vlan_pcp strip_vlan mod_dl_src mod_dl_dst mod_nw_src mod_nw_dst mod_nw_tos mod_tp_src mod_tp_dst 1(enp5s0): addr:00:13:3b:0f:13:01 config: 0 state: 0 current:1GB-FD AUTO_NEG advertised: 10MB-HD 10MB-FD 100MB-HD 100MB-FD 1GB-HD 1GB-FD COPPER AUTO_NEG AUTO_PAUSE AUTO_PAUSE_ASYM supported: 10MB-HD 10MB-FD 100MB-HD 100MB-FD 1GB-HD 1GB-FD COPPER AUTO_NEG speed: 1000 Mbps now, 1000 Mbps max 2(vport0): addr:fe:00:00:00:00:03 config: 0 state: 0 current:10MB-FD COPPER speed: 10 Mbps now, 0 Mbps max 3(vport1): addr:fe:00:00:00:00:04 config: 0 state: 0 current:10MB-FD COPPER speed: 10 Mbps now, 0 Mbps max 4(vtep): addr:aa:97:d2:a9:19:ed config: 0 state: 0 speed: 0 Mbps now, 0 Mbps max LOCAL(br0): addr:00:13:3b:0f:13:01 config: 0 state: 0 speed: 0 Mbps now, 0 Mbps max OFPT_GET_CONFIG_REPLY (xid=0x4): frags=normal miss_send_len=0 [root@deepspace ~]# ovs-ofctl --version ovs-ofctl (Open vSwitch) 2.6.1 OpenFlow versions 0x1:0x4 Please let us know if you need anything else to reproduce this. Thanks, Sagar. On Jan 18, 2017, at 1:19 PM, Ben Pfaff wrote: If you can come up with simple reproduction instructions that work for me, I'm happy to track this down. It's probably something very simple. On Tue, Jan 17, 2017 at 08:50:20AM -0800, Vidyasagara Guntaka wrote: This issue happened on our in-use systems and we were trying to find a way to move forward avoiding this issue so that we do not have to upgrade OVS on thousands of our hypervisors causing down time. Our debugging did help us avoid the issue for now by installing an explicit rule to to drop packets when there is no match and this issue is not seen over many hours of test runs. We will definitely run this test with latest version. But, will need more time since we are busy with our release related activities. Regards, Sagar. On Tue, Jan 17, 2017 at 8:42 AM, Ben Pfaff
Re: [ovs-dev] [PATCH] netdev-dpdk: add dpdk interface strip_vlan option
Yes, It is better to use the bit-field. I'll update the patch after the refereed patch has been applied On Tue, Apr 4, 2017 at 10:03 PM, Kevin Traynor <ktray...@redhat.com> wrote: > On 03/15/2017 08:46 AM, Zang MingJie wrote: >> dpdk-strip-vlan option specifies whether strip vlan for the dpdk interface. >> >> Signed-off-by: Zang MingJie <zealot0...@gmail.com> >> --- >> lib/netdev-dpdk.c| 23 ++- >> vswitchd/vswitch.xml | 7 +++ >> 2 files changed, 29 insertions(+), 1 deletion(-) >> >> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c >> index ddc651bf9..ea49adf3e 100644 >> --- a/lib/netdev-dpdk.c >> +++ b/lib/netdev-dpdk.c >> @@ -373,6 +373,7 @@ struct netdev_dpdk { >> int requested_n_rxq; >> int requested_rxq_size; >> int requested_txq_size; >> +bool requested_strip_vlan; >> >> /* Number of rx/tx descriptors for physical devices */ >> int rxq_size; >> @@ -395,6 +396,8 @@ struct netdev_dpdk { >> /* DPDK-ETH hardware offload features, >> * from the enum set 'dpdk_hw_ol_features' */ >> uint32_t hw_ol_features; >> + >> +bool strip_vlan; > > I don't think this should have it's own fields in the data struct as > there is a hw_ol_feature bitmask intended for that. At the moment it is > only used for rx checksum offload and could easily be extended for strip > vlan if people think it's a good feature to expose. Maybe the reason you > didn't use it in this patch is because it's currently broken and does > nothing on reconfiguration :| > > I've sent a patch to fix that here > https://mail.openvswitch.org/pipermail/ovs-dev/2017-April/330481.html > >> }; >> >> struct netdev_rxq_dpdk { >> @@ -646,6 +649,7 @@ dpdk_eth_dev_queue_setup(struct netdev_dpdk *dev, int >> n_rxq, int n_txq) >> conf.rxmode.jumbo_frame = 0; >> conf.rxmode.max_rx_pkt_len = 0; >> } >> +conf.rxmode.hw_vlan_strip = dev->strip_vlan; >> conf.rxmode.hw_ip_checksum = (dev->hw_ol_features & >>NETDEV_RX_CHECKSUM_OFFLOAD) != 0; >> /* A device may report more queues than it makes available (this has >> @@ -1133,6 +1137,19 @@ netdev_dpdk_process_devargs(const char *devargs, char >> **errp) >> } >> >> static void >> +dpdk_set_strip_vlan_config(struct netdev_dpdk *dev, const struct smap *args) >> +OVS_REQUIRES(dev->mutex) >> +{ >> +bool strip_vlan; >> + >> +strip_vlan = smap_get_bool(args, "dpdk-strip-vlan", false); >> +if (strip_vlan != dev->requested_strip_vlan) { >> +dev->requested_strip_vlan = strip_vlan; >> +netdev_request_reconfigure(>up); >> +} >> +} >> + >> +static void >> dpdk_set_rxq_config(struct netdev_dpdk *dev, const struct smap *args) >> OVS_REQUIRES(dev->mutex) >> { >> @@ -1182,6 +1199,7 @@ netdev_dpdk_set_config(struct netdev *netdev, const >> struct smap *args, >> ovs_mutex_lock(>mutex); >> >> dpdk_set_rxq_config(dev, args); >> +dpdk_set_strip_vlan_config(dev, args); >> >> dpdk_process_queue_size(netdev, args, "n_rxq_desc", >> NIC_PORT_DEFAULT_RXQ_SIZE, >> @@ -3123,7 +3141,8 @@ netdev_dpdk_reconfigure(struct netdev *netdev) >> && dev->mtu == dev->requested_mtu >> && dev->rxq_size == dev->requested_rxq_size >> && dev->txq_size == dev->requested_txq_size >> -&& dev->socket_id == dev->requested_socket_id) { >> +&& dev->socket_id == dev->requested_socket_id >> +&& dev->strip_vlan == dev->requested_strip_vlan) { >> /* Reconfiguration is unnecessary */ >> >> goto out; >> @@ -3145,6 +3164,8 @@ netdev_dpdk_reconfigure(struct netdev *netdev) >> dev->rxq_size = dev->requested_rxq_size; >> dev->txq_size = dev->requested_txq_size; >> >> +dev->strip_vlan = dev->requested_strip_vlan; >> + >> rte_free(dev->tx_q); >> err = dpdk_eth_dev_init(dev); >> dev->tx_q = netdev_dpdk_alloc_txq(netdev->n_txq); >> diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml >> index a91be59b0..5c0083188 100644 >> --- a/vswitchd/vswitch.xml >> +++ b/vswitchd/vswitch.xml >> @@ -2360,6 +2360,13 @@ >> >> >> >> + >> + >> + Specifies whether strip vlan for the dpdk interface. It is useful >> + when using ixgbevf interface with a vlan filter. >> + >> + >> + >>>type='{"type": "string"}'> >> >> > ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH] netdev-dpdk: add dpdk interface strip_vlan option
dpdk-strip-vlan option specifies whether strip vlan for the dpdk interface. Signed-off-by: Zang MingJie <zealot0...@gmail.com> --- lib/netdev-dpdk.c| 23 ++- vswitchd/vswitch.xml | 7 +++ 2 files changed, 29 insertions(+), 1 deletion(-) diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index ddc651bf9..ea49adf3e 100644 --- a/lib/netdev-dpdk.c +++ b/lib/netdev-dpdk.c @@ -373,6 +373,7 @@ struct netdev_dpdk { int requested_n_rxq; int requested_rxq_size; int requested_txq_size; +bool requested_strip_vlan; /* Number of rx/tx descriptors for physical devices */ int rxq_size; @@ -395,6 +396,8 @@ struct netdev_dpdk { /* DPDK-ETH hardware offload features, * from the enum set 'dpdk_hw_ol_features' */ uint32_t hw_ol_features; + +bool strip_vlan; }; struct netdev_rxq_dpdk { @@ -646,6 +649,7 @@ dpdk_eth_dev_queue_setup(struct netdev_dpdk *dev, int n_rxq, int n_txq) conf.rxmode.jumbo_frame = 0; conf.rxmode.max_rx_pkt_len = 0; } +conf.rxmode.hw_vlan_strip = dev->strip_vlan; conf.rxmode.hw_ip_checksum = (dev->hw_ol_features & NETDEV_RX_CHECKSUM_OFFLOAD) != 0; /* A device may report more queues than it makes available (this has @@ -1133,6 +1137,19 @@ netdev_dpdk_process_devargs(const char *devargs, char **errp) } static void +dpdk_set_strip_vlan_config(struct netdev_dpdk *dev, const struct smap *args) +OVS_REQUIRES(dev->mutex) +{ +bool strip_vlan; + +strip_vlan = smap_get_bool(args, "dpdk-strip-vlan", false); +if (strip_vlan != dev->requested_strip_vlan) { +dev->requested_strip_vlan = strip_vlan; +netdev_request_reconfigure(>up); +} +} + +static void dpdk_set_rxq_config(struct netdev_dpdk *dev, const struct smap *args) OVS_REQUIRES(dev->mutex) { @@ -1182,6 +1199,7 @@ netdev_dpdk_set_config(struct netdev *netdev, const struct smap *args, ovs_mutex_lock(>mutex); dpdk_set_rxq_config(dev, args); +dpdk_set_strip_vlan_config(dev, args); dpdk_process_queue_size(netdev, args, "n_rxq_desc", NIC_PORT_DEFAULT_RXQ_SIZE, @@ -3123,7 +3141,8 @@ netdev_dpdk_reconfigure(struct netdev *netdev) && dev->mtu == dev->requested_mtu && dev->rxq_size == dev->requested_rxq_size && dev->txq_size == dev->requested_txq_size -&& dev->socket_id == dev->requested_socket_id) { +&& dev->socket_id == dev->requested_socket_id +&& dev->strip_vlan == dev->requested_strip_vlan) { /* Reconfiguration is unnecessary */ goto out; @@ -3145,6 +3164,8 @@ netdev_dpdk_reconfigure(struct netdev *netdev) dev->rxq_size = dev->requested_rxq_size; dev->txq_size = dev->requested_txq_size; +dev->strip_vlan = dev->requested_strip_vlan; + rte_free(dev->tx_q); err = dpdk_eth_dev_init(dev); dev->tx_q = netdev_dpdk_alloc_txq(netdev->n_txq); diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml index a91be59b0..5c0083188 100644 --- a/vswitchd/vswitch.xml +++ b/vswitchd/vswitch.xml @@ -2360,6 +2360,13 @@ + + + Specifies whether strip vlan for the dpdk interface. It is useful + when using ixgbevf interface with a vlan filter. + + + -- 2.11.0 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH] netdev-dpdk: fix flow based tunnel
The push/pop header of flow based tunnel is generated dynamically by flows, so there won't be any tunnel header associated to netdev. The patch fixes that tnl_port_cache doesn't contain any flow based tunnel. Signed-off-by: Zang MingJie <zealot0...@gmail.com> --- lib/dpif-netdev.c | 2 +- lib/netdev.c | 7 --- 2 files changed, 1 insertion(+), 8 deletions(-) diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index 7d53a8def..383001e6c 100644 --- a/lib/dpif-netdev.c +++ b/lib/dpif-netdev.c @@ -3599,7 +3599,7 @@ pmd_load_cached_ports(struct dp_netdev_pmd_thread *pmd) hmap_shrink(>tnl_port_cache); HMAP_FOR_EACH (tx_port, node, >tx_ports) { -if (netdev_has_tunnel_push_pop(tx_port->port->netdev)) { +if (netdev_get_tunnel_config(tx_port->port->netdev) != NULL) { tx_port_cached = xmemdup(tx_port, sizeof *tx_port_cached); hmap_insert(>tnl_port_cache, _port_cached->node, hash_port_no(tx_port_cached->port->port_no)); diff --git a/lib/netdev.c b/lib/netdev.c index a8d8edad7..c10c7760c 100644 --- a/lib/netdev.c +++ b/lib/netdev.c @@ -117,13 +117,6 @@ netdev_is_pmd(const struct netdev *netdev) return netdev->netdev_class->is_pmd; } -bool -netdev_has_tunnel_push_pop(const struct netdev *netdev) -{ -return netdev->netdev_class->push_header - && netdev->netdev_class->pop_header; -} - static void netdev_initialize(void) OVS_EXCLUDED(netdev_mutex) -- 2.11.0 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev