Re: [ovs-dev] [PATCH ovn v2 0/6] MAC binding aging mechanism

2022-07-17 Thread Han Zhou
On Wed, Jul 13, 2022 at 1:49 AM Dumitru Ceara  wrote:
>
> On 7/13/22 08:45, Han Zhou wrote:
> > On Tue, Jul 12, 2022 at 1:02 AM Dumitru Ceara  wrote:
> >>
> >> On 7/11/22 09:26, Ales Musil wrote:
> >>> On Mon, Jul 11, 2022 at 8:51 AM Han Zhou  wrote:
> >>>
> 
> 
> 
>  On Fri, Jul 1, 2022 at 3:19 AM Ales Musil  wrote:
> >
> > Hi,
> >
> > as promised I have more results from testing. The results are
> > available
>  on the BZ  comment 6 [0].
> >
> > So about the scenario, there were two traffic types being sent. The
>  document has TCP latency and throughput, UPD throughput.
> > The traffic was splitted into two halves, 625 connections (50 flows)
>  were left without touching the MAC binding table. The same amount
> > (625 connections, 50 flows) was disrupted by periodical removal of
MAC
>  bindings every 20 sec.
> >
> > IMO the results prove two points:
> > 1) Removal of MAC binding does not seem to affect unrelated flows,
> > which
>  was a huge concern.
> > 2) There might be some added value in keeping the connection alive
as
>  long as it is used. The UDP disrupted graph shows
> > the the throughput was not able to catch up again after the
deletion,
>  it's debatable that the ramp up time is probably longer than
> > was the interval of removal and 20 s is not really sensible for
>  production. Anyway the connection check would prevent that, but only
> > on the
> > "owner" chassis.
> >
> >  So now it's probably about deciding what compromise to make. Having
> > the
>  owner with potential improvement about ownership transfer or
> > having just a simple timeout that will remove anything that expired.
> >
> > Han, Dumitru, Numan
> > please let me know what do you think about these results.
> >
> 
>  Thanks Ales for the detailed test results. From the graphs you
shared,
> > it
>  does look like the impact is quite obvious, for both throughput and
>  latency, even for TCP, right? Look at the TCP throughput line, for
> > about
>  50% of the time it was below 2G, while the one without disruption was
> > above
>  10G for most of the time.
>  Did I interpret the graph correctly? Why is it different from your
>  observation earlier (is it because of the max bandwidth of the test
> > env)?
>  Does it suggest that a simple timeout mechanism is not suitable?
> 
>  Thanks,
>  Han
> 
> >>>
> >>>
> >>> Hi Han,
> >>> yes you interpreted it correctly. I think that the difference is
> > because of
> >>> the very high bandwidth (within multiple flows) in this test and the
> >>> removal was quite quick. So the traffic was not able to catch up again
> > in
> >>> time before it got disrupted again. Also Xena seems to wait a bit for
> > the
> >>> other related traffic before it continues the previous stream so that
> > might
> >>> have added a bit of jitter on its own. The simple timeout might still
> > be a
> >>> solution, but we would need to make sure of two things:
> >>> a) The timeout is large enough so if there is high bandwidth traffic
it
> > can
> >>> catch up again.
> >>
> > Sounds reasonable, provided that the feature is configurable and can be
> > disabled for environments that are more sensitive to such disruptions.
> >
>
> I would go a step forward and make it configurable per logical router.
> Thinking of the ovn-k8s use case (but probably applicable to OpenStack
> and other CMS), there's probably less impact if we enable the feature on
> gateway routers compared to if we enable it on the distributed central
> cluster router.
>
+1. Good point.

> >> I went back to the RFC as this seems like it would have significantly
> >> forwarding impact if we unconditionally remove ARP entries:
> >>
> >> https://datatracker.ietf.org/doc/html/rfc1122#page-22
> >>
> >> 2.3.2.1  ARP Cache Validation
> >> [..]
> >>
> >>  (1)  Timeout -- Periodically time out cache entries,
> >>   even if they are in use.  Note that this timeout
> >>   should be restarted when the cache entry is
> >>   "refreshed" (by observing the source fields,
> >>   regardless of target address, of an ARP broadcast
> >>   from the system in question).  For proxy ARP
> >>   situations, the timeout needs to be on the order
> >>   of a minute.
> >>
> >> This mentions that an ARP entry's timeout should be restarted if the
> >> entry is refreshed.  That is, when the host for which we have the ARP
> >> entry sent an ARP broadcast.
> >>
> >> I think that makes it less likely to remove the entry while IP traffic
> >> is using it.
> >>
> >> In our case that might be too complex to implement
> >
> > I haven't thought through all details yet, but can we just handle ARP
> > requests through some extra pipelines with slowpath 

[ovs-dev] [PATCH] conntrack: Fix conntrack multiple new state

2022-07-17 Thread Eli Britstein via dev
A connection is established if we see packets from both directions.
The cited commit [1] fixed the issue of sending twice in one direction,
but still an issue if more than that.
Fix it.

Fixes: a867c010ee91 ("conntrack: Fix conntrack new state")
Signed-off-by: Eli Britstein 
---
 lib/conntrack-other.c   | 7 ---
 tests/system-traffic.at | 9 +
 2 files changed, 13 insertions(+), 3 deletions(-)

diff --git a/lib/conntrack-other.c b/lib/conntrack-other.c
index d3b4601858..7f3e63c384 100644
--- a/lib/conntrack-other.c
+++ b/lib/conntrack-other.c
@@ -48,18 +48,19 @@ other_conn_update(struct conntrack *ct, struct conn *conn_,
   struct dp_packet *pkt OVS_UNUSED, bool reply, long long now)
 {
 struct conn_other *conn = conn_other_cast(conn_);
-enum ct_update_res ret = CT_UPDATE_VALID;
 
 if (reply && conn->state != OTHERS_BIDIR) {
 conn->state = OTHERS_BIDIR;
 } else if (conn->state == OTHERS_FIRST) {
 conn->state = OTHERS_MULTIPLE;
-ret = CT_UPDATE_VALID_NEW;
 }
 
 conn_update_expiration(ct, >up, other_timeouts[conn->state], now);
 
-return ret;
+if (conn->state == OTHERS_BIDIR) {
+return CT_UPDATE_VALID;
+}
+return CT_UPDATE_VALID_NEW;
 }
 
 static bool
diff --git a/tests/system-traffic.at b/tests/system-traffic.at
index 89107ab624..182a78847e 100644
--- a/tests/system-traffic.at
+++ b/tests/system-traffic.at
@@ -3078,6 +3078,15 @@ NXST_FLOW reply:
  table=1, priority=100,ct_state=+est+trk,in_port=1 actions=output:2
 ])
 
+dnl Send a 3rd UDP packet on port 1
+AT_CHECK([ovs-ofctl -O OpenFlow13 packet-out br0 "in_port=1 
packet=5054000a505400090800451c0011a4cd0a0101010a010102000100020008
 actions=resubmit(,0)"])
+
+dnl There still should not be any packet that matches the established ct_state.
+AT_CHECK([ovs-ofctl dump-flows br0 "table=1 in_port=1,ct_state=+trk+est" | 
ofctl_strip], [0], [dnl
+NXST_FLOW reply:
+ table=1, priority=100,ct_state=+est+trk,in_port=1 actions=output:2
+])
+
 OVS_TRAFFIC_VSWITCHD_STOP
 AT_CLEANUP
 
-- 
2.26.2.1730.g385c171

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v5 13/17] python: introduce unit tests

2022-07-17 Thread David Marchand
On Fri, Jul 8, 2022 at 8:09 PM Adrian Moreno  wrote:
> diff --git a/.ci/linux-prepare.sh b/.ci/linux-prepare.sh
> index a0635cf56..de741c6b4 100755
> --- a/.ci/linux-prepare.sh
> +++ b/.ci/linux-prepare.sh
> @@ -45,6 +45,9 @@ if [ "$M32" ]; then
>  sudo apt-get install -y $pkgs
>  fi
>
> +# Install python test dependencies
> +pip install -r python/test_requirements.txt
> +

Should it be pip3?

We still run Ubuntu 18.04, and "pip" points at python2 pip.
At some point, we preferred pip3 (24e697080948 ("travis: Use pip3 to
install the python packages on linux")).


>  # IPv6 is supported by kernel but disabled in TravisCI images:
>  #   https://github.com/travis-ci/travis-ci/issues/8891
>  # Enable it to avoid skipping of IPv6 related tests.


-- 
David Marchand

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev