On Wed, Jul 13, 2022 at 1:49 AM Dumitru Ceara <[email protected]> wrote:
>
> On 7/13/22 08:45, Han Zhou wrote:
> > On Tue, Jul 12, 2022 at 1:02 AM Dumitru Ceara <[email protected]> wrote:
> >>
> >> On 7/11/22 09:26, Ales Musil wrote:
> >>> On Mon, Jul 11, 2022 at 8:51 AM Han Zhou <[email protected]> wrote:
> >>>
> >>>>
> >>>>
> >>>>
> >>>> On Fri, Jul 1, 2022 at 3:19 AM Ales Musil <[email protected]> wrote:
> >>>>>
> >>>>> Hi,
> >>>>>
> >>>>> as promised I have more results from testing. The results are
> > available
> >>>> on the BZ  comment 6 [0].
> >>>>>
> >>>>> So about the scenario, there were two traffic types being sent. The
> >>>> document has TCP latency and throughput, UPD throughput.
> >>>>> The traffic was splitted into two halves, 625 connections (50 flows)
> >>>> were left without touching the MAC binding table. The same amount
> >>>>> (625 connections, 50 flows) was disrupted by periodical removal of
MAC
> >>>> bindings every 20 sec.
> >>>>>
> >>>>> IMO the results prove two points:
> >>>>> 1) Removal of MAC binding does not seem to affect unrelated flows,
> > which
> >>>> was a huge concern.
> >>>>> 2) There might be some added value in keeping the connection alive
as
> >>>> long as it is used. The UDP disrupted graph shows
> >>>>> the the throughput was not able to catch up again after the
deletion,
> >>>> it's debatable that the ramp up time is probably longer than
> >>>>> was the interval of removal and 20 s is not really sensible for
> >>>> production. Anyway the connection check would prevent that, but only
> > on the
> >>>>> "owner" chassis.
> >>>>>
> >>>>>  So now it's probably about deciding what compromise to make. Having
> > the
> >>>> owner with potential improvement about ownership transfer or
> >>>>> having just a simple timeout that will remove anything that expired.
> >>>>>
> >>>>> Han, Dumitru, Numan
> >>>>> please let me know what do you think about these results.
> >>>>>
> >>>>
> >>>> Thanks Ales for the detailed test results. From the graphs you
shared,
> > it
> >>>> does look like the impact is quite obvious, for both throughput and
> >>>> latency, even for TCP, right? Look at the TCP throughput line, for
> > about
> >>>> 50% of the time it was below 2G, while the one without disruption was
> > above
> >>>> 10G for most of the time.
> >>>> Did I interpret the graph correctly? Why is it different from your
> >>>> observation earlier (is it because of the max bandwidth of the test
> > env)?
> >>>> Does it suggest that a simple timeout mechanism is not suitable?
> >>>>
> >>>> Thanks,
> >>>> Han
> >>>>
> >>>
> >>>
> >>> Hi Han,
> >>> yes you interpreted it correctly. I think that the difference is
> > because of
> >>> the very high bandwidth (within multiple flows) in this test and the
> >>> removal was quite quick. So the traffic was not able to catch up again
> > in
> >>> time before it got disrupted again. Also Xena seems to wait a bit for
> > the
> >>> other related traffic before it continues the previous stream so that
> > might
> >>> have added a bit of jitter on its own. The simple timeout might still
> > be a
> >>> solution, but we would need to make sure of two things:
> >>> a) The timeout is large enough so if there is high bandwidth traffic
it
> > can
> >>> catch up again.
> >>
> > Sounds reasonable, provided that the feature is configurable and can be
> > disabled for environments that are more sensitive to such disruptions.
> >
>
> I would go a step forward and make it configurable per logical router.
> Thinking of the ovn-k8s use case (but probably applicable to OpenStack
> and other CMS), there's probably less impact if we enable the feature on
> gateway routers compared to if we enable it on the distributed central
> cluster router.
>
+1. Good point.

> >> I went back to the RFC as this seems like it would have significantly
> >> forwarding impact if we unconditionally remove ARP entries:
> >>
> >> https://datatracker.ietf.org/doc/html/rfc1122#page-22
> >>
> >> 2.3.2.1  ARP Cache Validation
> >> [..]
> >>
> >>                  (1)  Timeout -- Periodically time out cache entries,
> >>                       even if they are in use.  Note that this timeout
> >>                       should be restarted when the cache entry is
> >>                       "refreshed" (by observing the source fields,
> >>                       regardless of target address, of an ARP broadcast
> >>                       from the system in question).  For proxy ARP
> >>                       situations, the timeout needs to be on the order
> >>                       of a minute.
> >>
> >> This mentions that an ARP entry's timeout should be restarted if the
> >> entry is refreshed.  That is, when the host for which we have the ARP
> >> entry sent an ARP broadcast.
> >>
> >> I think that makes it less likely to remove the entry while IP traffic
> >> is using it.
> >>
> >> In our case that might be too complex to implement
> >
> > I haven't thought through all details yet, but can we just handle ARP
> > requests through some extra pipelines with slowpath actions which
refresh
> > the expire time? Maybe some ratelimit (meter) and some random delay is
> > needed, but it seems to be structurally independent and clear. Could you
> > point out the blockers that I might have missed?
> >
>
> Sounds possible.  We do have to be careful though because today ARP
> requests are only forwarded inside the OVN switch pipeline on the ports
> connected to routers that own the target IP of the ARP request.  That
> means the others routers connected to the logical switch won't process
> these ARP requests and potentially miss to refresh timers.

Today it is controlled by the "always_learn_from_arp_request" option. By
default it is true, and all the requests are handled regardless of the
target.
Even when it is "false", if the documentation is still up-to-date, it is
examined if there is already a mac-binding entry existed on the LRP for the
sender that needs to be updated:

              false - If there is a MAC binding for that IP and the MAC is
different, or, if TPA of ARP request belongs to any router port on this
router, then update/add that MAC-IP binding. Otherwise, don’t update/add
entries.

So, for the timeout refresh, this needs to be changed: if there is a MAC
binding for that IP, not only check if it needs to be updated, but also
refresh a timestamp.

Thanks,
Han

>
> In any case, this is a detail we can probably iron out later.
>
> >> so maybe we should
> >> consider going back to Numan's suggestion, essentially:
> >>
> >>
> >>                  (2)  Unicast Poll -- Actively poll the remote host by
> >>                       periodically sending a point-to-point ARP Request
> >>                       to it, and delete the entry if no ARP Reply is
> >>                       received from N successive polls.  Again, the
> >>                       timeout should be on the order of a minute, and
> >>                       typically N is 2.
> >>
> >
> > This approach seems more complex to me than (1), because it requires the
> > "owner" role, and it is not easy to manage, because the "owner" (in
fact it
> > is just an agent) may need to be transfered.
> > However, of course there is a benefit of this approach: it would never
have
> > a case when an entry is needed but was deleted, thus ensures no
dataplane
> > impact.
> >
> > So, my suggestion is:
> > step 0.5 - implement (1) without the refresh mechanism - just
experimental
> > step 1 - implement (1) with refresh mechanism
> > step 2 - as a more advanced (but also more complex) implementation, if
(1)
> > is proved not sufficient
> >
>
> Sounds like a good plan forward.
>
> Thanks,
> Dumitru
>
> > Thanks,
> > Han
> >
> >> Thoughts?
> >>
> >>> b) We need to prevent bulk removals of MAC bindings that were added at
> >>> roughly the same time. So some sort of limitation of how many we can
> > remove
> >>> in single transaction or adding some random delay
> >>> upon creation. Or we can actually do both because there is no harm if
> > the
> >>> MAC binding is removed later than the threshold.
> >>
> >> Ales, you posted
> >>
> >
https://patchwork.ozlabs.org/project/ovn/patch/[email protected]/
> >> which should cover the addition part if i'm not wrong, but I agree with
> >> Han's concern there about combining the mac binding aging with the
> >> random addition delay.
> >>
> >> A random removal delay seems less risky at a first glance.
> >>
> >> Thanks,
> >> Dumitru
> >>
> >>>
> >>> Thanks,
> >>> Ales
> >>>
> >>>
> >>>>
> >>>>> Regards,
> >>>>> Ales
> >>>>>
> >>>>> [0] https://bugzilla.redhat.com/show_bug.cgi?id=2084668#c6
> >>>>>
> >>>>> On Thu, Jun 30, 2022 at 7:32 AM Ales Musil <[email protected]>
wrote:
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> On Thu, Jun 30, 2022 at 6:58 AM Han Zhou <[email protected]> wrote:
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> On Mon, Jun 27, 2022 at 11:55 PM Ales Musil <[email protected]>
> > wrote:
> >>>>>>>>
> >>>>>>>> Hi,
> >>>>>>>>
> >>>>>>>> so I did the suggested test. Setup was HIV1 - ext0 and vm0, HIV2
> >>>> ext1 and vm1
> >>>>>>>>
> >>>>>>>> The networks were connected as follow:
> >>>>>>>> - vm0 and vm1 on the same switch
> >>>>>>>> - logical router connected with the "internal" and "external"
> > switch
> >>>>>>>> - "external" switch connected to ext0 and ext1 through localnet
> >>>>>>>>
> >>>>>>>> So the traffic was flowing:
> >>>>>>>> vmX -- LR -- localnet -- extX
> >>>>>>>>
> >>>>>>>> The iperf was running between vm0 - ext1 and vm1 - ext0.
> >>>>>>>>
> >>>>>>>> I have removed the MAC binding for ext0 multiple times to see if
it
> >>>> affects the other traffic.
> >>>>>>>> And it actually does not, which is great.
> >>>>>>>>
> >>>>>>>> iperf output from vm0 - ext1:
> >>>>>>>> [ ID] Interval           Transfer     Bitrate         Retr  Cwnd
> >>>>>>>> [  5]   0.00-1.00   sec  12.0 MBytes   101 Mbits/sec    0    290
> >>>> KBytes
> >>>>>>>> [  5]   1.00-2.00   sec  11.9 MBytes  99.6 Mbits/sec    0    290
> >>>> KBytes
> >>>>>>>> [  5]   2.00-3.00   sec  12.0 MBytes   101 Mbits/sec    0    290
> >>>> KBytes
> >>>>>>>> [  5]   3.00-4.00   sec  11.9 MBytes  99.6 Mbits/sec    0    290
> >>>> KBytes
> >>>>>>>> [  5]   4.00-5.00   sec  11.9 MBytes  99.6 Mbits/sec    0    290
> >>>> KBytes
> >>>>>>>> [  5]   5.00-6.00   sec  12.0 MBytes   101 Mbits/sec    0    290
> >>>> KBytes
> >>>>>>>> [  5]   6.00-7.00   sec  11.9 MBytes  99.6 Mbits/sec    0    290
> >>>> KBytes
> >>>>>>>> [  5]   7.00-8.00   sec  11.9 MBytes  99.6 Mbits/sec    0    290
> >>>> KBytes
> >>>>>>>> [  5]   8.00-9.00   sec  12.0 MBytes   101 Mbits/sec    0    290
> >>>> KBytes
> >>>>>>>> [  5]   9.00-10.00  sec  11.9 MBytes  99.6 Mbits/sec    0    290
> >>>> KBytes
> >>>>>>>> [  5]  10.00-11.00  sec  11.9 MBytes  99.6 Mbits/sec    0    290
> >>>> KBytes
> >>>>>>>> [  5]  11.00-12.00  sec  12.0 MBytes   101 Mbits/sec    0    290
> >>>> KBytes
> >>>>>>>> [  5]  12.00-13.00  sec  11.9 MBytes  99.6 Mbits/sec    0    290
> >>>> KBytes
> >>>>>>>> [  5]  13.00-14.00  sec  12.0 MBytes   101 Mbits/sec    0    290
> >>>> KBytes
> >>>>>>>> [  5]  14.00-15.00  sec  11.9 MBytes  99.6 Mbits/sec    0    290
> >>>> KBytes
> >>>>>>>> [  5]  15.00-16.00  sec  11.9 MBytes  99.6 Mbits/sec    0    290
> >>>> KBytes
> >>>>>>>> [  5]  16.00-17.00  sec  12.0 MBytes   101 Mbits/sec    0    290
> >>>> KBytes
> >>>>>>>> [  5]  17.00-18.00  sec  11.9 MBytes  99.6 Mbits/sec    0    290
> >>>> KBytes
> >>>>>>>> [  5]  18.00-19.00  sec  11.9 MBytes  99.6 Mbits/sec    0    290
> >>>> KBytes
> >>>>>>>> [  5]  19.00-20.00  sec  12.0 MBytes   101 Mbits/sec    0    290
> >>>> KBytes
> >>>>>>>> [  5]  20.00-21.00  sec  11.9 MBytes  99.6 Mbits/sec    0    290
> >>>> KBytes
> >>>>>>>> [  5]  21.00-22.00  sec  11.9 MBytes  99.6 Mbits/sec    0    290
> >>>> KBytes
> >>>>>>>> [  5]  22.00-23.00  sec  12.0 MBytes   101 Mbits/sec    0    290
> >>>> KBytes
> >>>>>>>> [  5]  23.00-24.00  sec  11.9 MBytes  99.6 Mbits/sec    0    290
> >>>> KBytes
> >>>>>>>> [  5]  24.00-25.00  sec  12.0 MBytes   101 Mbits/sec    0    290
> >>>> KBytes
> >>>>>>>> [  5]  25.00-26.00  sec  11.9 MBytes  99.6 Mbits/sec    0    290
> >>>> KBytes
> >>>>>>>> [  5]  26.00-27.00  sec  11.9 MBytes  99.6 Mbits/sec    0    290
> >>>> KBytes
> >>>>>>>> [  5]  27.00-28.00  sec  12.0 MBytes   101 Mbits/sec    0    290
> >>>> KBytes
> >>>>>>>> [  5]  28.00-29.00  sec  11.9 MBytes  99.6 Mbits/sec    0    290
> >>>> KBytes
> >>>>>>>> [  5]  29.00-30.00  sec  11.9 MBytes  99.6 Mbits/sec    0    290
> >>>> KBytes
> >>>>>>>> [  5]  30.00-31.00  sec  12.0 MBytes   101 Mbits/sec    0    290
> >>>> KBytes
> >>>>>>>> [  5]  31.00-32.00  sec  11.9 MBytes  99.6 Mbits/sec    0    290
> >>>> KBytes
> >>>>>>>> [  5]  32.00-33.00  sec  12.0 MBytes   101 Mbits/sec    0    290
> >>>> KBytes
> >>>>>>>> [  5]  33.00-34.00  sec  11.9 MBytes  99.6 Mbits/sec    0    290
> >>>> KBytes
> >>>>>>>> [  5]  34.00-35.00  sec  11.9 MBytes  99.6 Mbits/sec    0    290
> >>>> KBytes
> >>>>>>>> [  5]  35.00-36.00  sec  12.0 MBytes   101 Mbits/sec    0    290
> >>>> KBytes
> >>>>>>>> [  5]  36.00-37.00  sec  11.9 MBytes  99.6 Mbits/sec    0    290
> >>>> KBytes
> >>>>>>>> [  5]  37.00-38.00  sec  11.9 MBytes  99.6 Mbits/sec    0    290
> >>>> KBytes
> >>>>>>>> [  5]  38.00-39.00  sec  12.0 MBytes   101 Mbits/sec    0    290
> >>>> KBytes
> >>>>>>>> [  5]  39.00-40.00  sec  11.9 MBytes  99.6 Mbits/sec    0    290
> >>>> KBytes
> >>>>>>>> [  5]  40.00-41.00  sec  11.9 MBytes  99.6 Mbits/sec    0    290
> >>>> KBytes
> >>>>>>>> [  5]  41.00-42.00  sec  12.0 MBytes   101 Mbits/sec    0    290
> >>>> KBytes
> >>>>>>>> [  5]  42.00-43.00  sec  11.9 MBytes  99.6 Mbits/sec    0    290
> >>>> KBytes
> >>>>>>>> [  5]  43.00-44.00  sec  12.0 MBytes   101 Mbits/sec    0    290
> >>>> KBytes
> >>>>>>>> [  5]  44.00-45.00  sec  11.9 MBytes  99.6 Mbits/sec    0    290
> >>>> KBytes
> >>>>>>>> [  5]  45.00-46.00  sec  11.9 MBytes  99.6 Mbits/sec    0    290
> >>>> KBytes
> >>>>>>>> [  5]  46.00-47.00  sec  12.0 MBytes   101 Mbits/sec    0    290
> >>>> KBytes
> >>>>>>>> [  5]  47.00-48.00  sec  11.9 MBytes  99.6 Mbits/sec    0    290
> >>>> KBytes
> >>>>>>>> [  5]  48.00-49.00  sec  11.9 MBytes  99.6 Mbits/sec    0    290
> >>>> KBytes
> >>>>>>>> [  5]  49.00-50.00  sec  12.0 MBytes   101 Mbits/sec    0    290
> >>>> KBytes
> >>>>>>>> [  5]  50.00-51.00  sec  11.9 MBytes  99.6 Mbits/sec    0    290
> >>>> KBytes
> >>>>>>>> [  5]  51.00-52.00  sec  12.0 MBytes   101 Mbits/sec    0    290
> >>>> KBytes
> >>>>>>>> [  5]  52.00-53.00  sec  11.9 MBytes  99.6 Mbits/sec    0    290
> >>>> KBytes
> >>>>>>>> [  5]  53.00-54.00  sec  11.9 MBytes  99.6 Mbits/sec    0    290
> >>>> KBytes
> >>>>>>>> [  5]  54.00-55.00  sec  12.0 MBytes   101 Mbits/sec    0    290
> >>>> KBytes
> >>>>>>>> [  5]  55.00-56.00  sec  11.9 MBytes  99.6 Mbits/sec    0    290
> >>>> KBytes
> >>>>>>>> [  5]  56.00-57.00  sec  11.9 MBytes  99.6 Mbits/sec    0    290
> >>>> KBytes
> >>>>>>>> [  5]  57.00-58.00  sec  12.0 MBytes   101 Mbits/sec    0    290
> >>>> KBytes
> >>>>>>>> [  5]  58.00-59.00  sec  11.9 MBytes  99.6 Mbits/sec    0    290
> >>>> KBytes
> >>>>>>>> [  5]  59.00-60.00  sec  11.9 MBytes  99.6 Mbits/sec    0    290
> >>>> KBytes
> >>>>>>>> [  5]  60.00-61.00  sec  12.0 MBytes   101 Mbits/sec    0    290
> >>>> KBytes
> >>>>>>>> [  5]  61.00-62.00  sec  11.9 MBytes  99.6 Mbits/sec    0    290
> >>>> KBytes
> >>>>>>>> [  5]  62.00-63.00  sec  12.0 MBytes   101 Mbits/sec    0    290
> >>>> KBytes
> >>>>>>>> [  5]  63.00-64.00  sec  11.9 MBytes  99.6 Mbits/sec    0    290
> >>>> KBytes
> >>>>>>>> [  5]  64.00-65.00  sec  11.9 MBytes  99.6 Mbits/sec    0    290
> >>>> KBytes
> >>>>>>>> [  5]  65.00-66.00  sec  12.0 MBytes   101 Mbits/sec    0    290
> >>>> KBytes
> >>>>>>>> [  5]  66.00-67.00  sec  11.9 MBytes  99.6 Mbits/sec    0    290
> >>>> KBytes
> >>>>>>>> [  5]  67.00-68.00  sec  11.9 MBytes  99.6 Mbits/sec    0    290
> >>>> KBytes
> >>>>>>>> [  5]  68.00-69.00  sec  12.0 MBytes   101 Mbits/sec    0    290
> >>>> KBytes
> >>>>>>>> [  5]  69.00-70.00  sec  11.9 MBytes  99.6 Mbits/sec    0    290
> >>>> KBytes
> >>>>>>>> [  5]  70.00-71.00  sec  12.0 MBytes   101 Mbits/sec    0    290
> >>>> KBytes
> >>>>>>>> [  5]  71.00-72.00  sec  11.9 MBytes  99.6 Mbits/sec    0    290
> >>>> KBytes
> >>>>>>>> [  5]  72.00-73.00  sec  11.9 MBytes  99.6 Mbits/sec    0    290
> >>>> KBytes
> >>>>>>>> [  5]  73.00-74.00  sec  12.0 MBytes   101 Mbits/sec    0    290
> >>>> KBytes
> >>>>>>>> [  5]  74.00-75.00  sec  11.9 MBytes  99.6 Mbits/sec    0    290
> >>>> KBytes
> >>>>>>>> [  5]  75.00-76.00  sec  11.9 MBytes  99.6 Mbits/sec    0    290
> >>>> KBytes
> >>>>>>>> [  5]  76.00-77.00  sec  12.0 MBytes   101 Mbits/sec    0    290
> >>>> KBytes
> >>>>>>>> [  5]  77.00-78.00  sec  11.9 MBytes  99.6 Mbits/sec    0    290
> >>>> KBytes
> >>>>>>>> [  5]  78.00-79.00  sec  11.9 MBytes  99.6 Mbits/sec    0    290
> >>>> KBytes
> >>>>>>>> [  5]  79.00-80.00  sec  12.0 MBytes   101 Mbits/sec    0    290
> >>>> KBytes
> >>>>>>>> [  5]  80.00-81.00  sec  11.9 MBytes  99.6 Mbits/sec    0    290
> >>>> KBytes
> >>>>>>>> [  5]  81.00-82.00  sec  12.0 MBytes   101 Mbits/sec    0    290
> >>>> KBytes
> >>>>>>>> [  5]  82.00-83.00  sec  11.9 MBytes  99.6 Mbits/sec    0    290
> >>>> KBytes
> >>>>>>>> [  5]  83.00-84.00  sec  11.9 MBytes  99.6 Mbits/sec    0    290
> >>>> KBytes
> >>>>>>>> [  5]  84.00-85.00  sec  12.0 MBytes   101 Mbits/sec    0    290
> >>>> KBytes
> >>>>>>>> [  5]  85.00-86.00  sec  11.9 MBytes  99.6 Mbits/sec    0    290
> >>>> KBytes
> >>>>>>>> [  5]  86.00-87.00  sec  11.9 MBytes  99.6 Mbits/sec    0    290
> >>>> KBytes
> >>>>>>>> [  5]  87.00-88.00  sec  12.0 MBytes   101 Mbits/sec    0    290
> >>>> KBytes
> >>>>>>>> [  5]  88.00-89.00  sec  11.9 MBytes  99.6 Mbits/sec    0    290
> >>>> KBytes
> >>>>>>>> [  5]  89.00-90.00  sec  11.9 MBytes  99.6 Mbits/sec    0    290
> >>>> KBytes
> >>>>>>>> [  5]  90.00-91.00  sec  12.0 MBytes   101 Mbits/sec    0    290
> >>>> KBytes
> >>>>>>>> [  5]  91.00-92.00  sec  11.9 MBytes  99.6 Mbits/sec    0    290
> >>>> KBytes
> >>>>>>>> [  5]  92.00-93.00  sec  12.0 MBytes   101 Mbits/sec    0    290
> >>>> KBytes
> >>>>>>>> [  5]  93.00-94.00  sec  11.9 MBytes  99.6 Mbits/sec    0    290
> >>>> KBytes
> >>>>>>>> [  5]  94.00-95.00  sec  11.9 MBytes  99.6 Mbits/sec    0    290
> >>>> KBytes
> >>>>>>>> [  5]  95.00-96.00  sec  12.0 MBytes   101 Mbits/sec    0    290
> >>>> KBytes
> >>>>>>>> [  5]  96.00-97.00  sec  11.9 MBytes  99.6 Mbits/sec    0    290
> >>>> KBytes
> >>>>>>>> [  5]  97.00-98.00  sec  11.9 MBytes  99.6 Mbits/sec    0    290
> >>>> KBytes
> >>>>>>>> [  5]  98.00-99.00  sec  12.0 MBytes   101 Mbits/sec    0    290
> >>>> KBytes
> >>>>>>>> [  5]  99.00-100.00 sec  11.9 MBytes  99.6 Mbits/sec    0    290
> >>>> KBytes
> >>>>>>>> [  5] 100.00-101.00 sec  12.0 MBytes   101 Mbits/sec    0    290
> >>>> KBytes
> >>>>>>>> [  5] 101.00-102.00 sec  11.9 MBytes  99.6 Mbits/sec    0    290
> >>>> KBytes
> >>>>>>>> [  5] 102.00-103.00 sec  11.9 MBytes  99.6 Mbits/sec    0    290
> >>>> KBytes
> >>>>>>>> [  5] 103.00-104.00 sec  12.0 MBytes   101 Mbits/sec    0    290
> >>>> KBytes
> >>>>>>>> [  5] 104.00-105.00 sec  11.9 MBytes  99.6 Mbits/sec    0    290
> >>>> KBytes
> >>>>>>>> [  5] 105.00-106.00 sec  11.9 MBytes  99.6 Mbits/sec    0    290
> >>>> KBytes
> >>>>>>>> [  5] 106.00-107.00 sec  12.0 MBytes   101 Mbits/sec    0    290
> >>>> KBytes
> >>>>>>>> [  5] 107.00-108.00 sec  11.9 MBytes  99.6 Mbits/sec    0    290
> >>>> KBytes
> >>>>>>>> [  5] 108.00-109.00 sec  11.9 MBytes  99.6 Mbits/sec    0    290
> >>>> KBytes
> >>>>>>>> [  5] 109.00-110.00 sec  12.0 MBytes   101 Mbits/sec    0    290
> >>>> KBytes
> >>>>>>>> [  5] 110.00-111.00 sec  11.9 MBytes  99.6 Mbits/sec    0    290
> >>>> KBytes
> >>>>>>>> [  5] 111.00-112.00 sec  12.0 MBytes   101 Mbits/sec    0    290
> >>>> KBytes
> >>>>>>>> [  5] 112.00-113.00 sec  11.9 MBytes  99.6 Mbits/sec    0    290
> >>>> KBytes
> >>>>>>>> [  5] 113.00-114.00 sec  11.9 MBytes  99.6 Mbits/sec    0    290
> >>>> KBytes
> >>>>>>>> [  5] 114.00-115.00 sec  12.0 MBytes   101 Mbits/sec    0    290
> >>>> KBytes
> >>>>>>>> [  5] 115.00-116.00 sec  11.9 MBytes  99.6 Mbits/sec    0    290
> >>>> KBytes
> >>>>>>>> [  5] 116.00-117.00 sec  11.9 MBytes  99.6 Mbits/sec    0    290
> >>>> KBytes
> >>>>>>>> [  5] 117.00-118.00 sec  12.0 MBytes   101 Mbits/sec    0    290
> >>>> KBytes
> >>>>>>>> [  5] 118.00-119.00 sec  11.9 MBytes  99.6 Mbits/sec    0    290
> >>>> KBytes
> >>>>>>>> [  5] 119.00-120.00 sec  12.0 MBytes   101 Mbits/sec    0    290
> >>>> KBytes
> >>>>>>>> - - - - - - - - - - - - - - - - - - - - - - - - -
> >>>>>>>> [ ID] Interval           Transfer     Bitrate         Retr
> >>>>>>>> [  5]   0.00-120.00 sec  1.40 GBytes   100 Mbits/sec    0
> >>>>   sender
> >>>>>>>> [  5]   0.00-120.00 sec  1.40 GBytes   100 Mbits/sec
> >>>>  receiver
> >>>>>>>>
> >>>>>>>> iperf output from vm1 - ext0:
> >>>>>>>> [ ID] Interval           Transfer     Bitrate         Retr  Cwnd
> >>>>>>>> [  5]   0.00-1.00   sec  12.0 MBytes   101 Mbits/sec    0    150
> >>>> KBytes
> >>>>>>>> [  5]   1.00-2.00   sec  11.9 MBytes  99.6 Mbits/sec    0    150
> >>>> KBytes
> >>>>>>>> [  5]   2.00-3.00   sec  12.0 MBytes   101 Mbits/sec    0    150
> >>>> KBytes
> >>>>>>>> [  5]   3.00-4.00   sec  11.9 MBytes  99.6 Mbits/sec  127    118
> >>>> KBytes
> >>>>>>>> [  5]   4.00-5.00   sec  11.9 MBytes  99.6 Mbits/sec    0    130
> >>>> KBytes
> >>>>>>>> [  5]   5.00-6.00   sec  12.0 MBytes   101 Mbits/sec    0    130
> >>>> KBytes
> >>>>>>>> [  5]   6.00-7.00   sec  11.9 MBytes  99.6 Mbits/sec    0    130
> >>>> KBytes
> >>>>>>>> [  5]   7.00-8.00   sec  11.9 MBytes  99.6 Mbits/sec   96    160
> >>>> KBytes
> >>>>>>>> [  5]   8.00-9.00   sec  12.0 MBytes   101 Mbits/sec    0    160
> >>>> KBytes
> >>>>>>>> [  5]   9.00-10.00  sec  11.9 MBytes  99.6 Mbits/sec    0    160
> >>>> KBytes
> >>>>>>>> [  5]  10.00-11.00  sec  11.9 MBytes  99.6 Mbits/sec  118    130
> >>>> KBytes
> >>>>>>>> [  5]  11.00-12.00  sec  12.0 MBytes   101 Mbits/sec    0    130
> >>>> KBytes
> >>>>>>>> [  5]  12.00-13.00  sec  11.9 MBytes  99.6 Mbits/sec    0    130
> >>>> KBytes
> >>>>>>>> [  5]  13.00-14.00  sec  11.5 MBytes  96.5 Mbits/sec    3   4.07
> >>>> KBytes
> >>>>>>>> [  5]  14.00-15.00  sec  12.4 MBytes   104 Mbits/sec   93    178
> >>>> KBytes
> >>>>>>>> [  5]  15.00-16.00  sec  11.9 MBytes  99.6 Mbits/sec    0    178
> >>>> KBytes
> >>>>>>>> [  5]  16.00-17.00  sec  12.0 MBytes   101 Mbits/sec    0    178
> >>>> KBytes
> >>>>>>>> [  5]  17.00-18.00  sec  11.9 MBytes  99.6 Mbits/sec    0    178
> >>>> KBytes
> >>>>>>>> [  5]  18.00-19.00  sec  11.9 MBytes  99.6 Mbits/sec  138    130
> >>>> KBytes
> >>>>>>>> [  5]  19.00-20.00  sec  12.0 MBytes   101 Mbits/sec    0    130
> >>>> KBytes
> >>>>>>>> [  5]  20.00-21.00  sec  11.9 MBytes  99.6 Mbits/sec    0    130
> >>>> KBytes
> >>>>>>>> [  5]  21.00-22.00  sec  12.0 MBytes   101 Mbits/sec    0    130
> >>>> KBytes
> >>>>>>>> [  5]  22.00-23.00  sec  11.9 MBytes  99.6 Mbits/sec    0    130
> >>>> KBytes
> >>>>>>>> [  5]  23.00-24.00  sec  11.9 MBytes  99.6 Mbits/sec    0    130
> >>>> KBytes
> >>>>>>>> [  5]  24.00-25.00  sec  12.0 MBytes   101 Mbits/sec   96    195
> >>>> KBytes
> >>>>>>>> [  5]  25.00-26.00  sec  11.9 MBytes  99.6 Mbits/sec    0    195
> >>>> KBytes
> >>>>>>>> [  5]  26.00-27.00  sec  11.9 MBytes  99.6 Mbits/sec    0    195
> >>>> KBytes
> >>>>>>>> [  5]  27.00-28.00  sec  12.0 MBytes   101 Mbits/sec    0    195
> >>>> KBytes
> >>>>>>>> [  5]  28.00-29.00  sec  11.9 MBytes  99.6 Mbits/sec    0    195
> >>>> KBytes
> >>>>>>>> [  5]  29.00-30.00  sec  11.9 MBytes  99.6 Mbits/sec    0    195
> >>>> KBytes
> >>>>>>>> [  5]  30.00-31.00  sec  12.0 MBytes   101 Mbits/sec    0    195
> >>>> KBytes
> >>>>>>>> [  5]  31.00-32.00  sec  11.9 MBytes  99.6 Mbits/sec  145    225
> >>>> KBytes
> >>>>>>>> [  5]  32.00-33.00  sec  12.0 MBytes   101 Mbits/sec    0    225
> >>>> KBytes
> >>>>>>>> [  5]  33.00-34.00  sec  11.9 MBytes  99.6 Mbits/sec    0    225
> >>>> KBytes
> >>>>>>>> [  5]  34.00-35.00  sec  11.9 MBytes  99.6 Mbits/sec    0    225
> >>>> KBytes
> >>>>>>>> [  5]  35.00-36.00  sec  12.0 MBytes   101 Mbits/sec    0    225
> >>>> KBytes
> >>>>>>>> [  5]  36.00-37.00  sec  11.9 MBytes  99.6 Mbits/sec    0    225
> >>>> KBytes
> >>>>>>>> [  5]  37.00-38.00  sec  11.9 MBytes  99.6 Mbits/sec    0    225
> >>>> KBytes
> >>>>>>>> [  5]  38.00-39.00  sec  12.0 MBytes   101 Mbits/sec    0    225
> >>>> KBytes
> >>>>>>>> [  5]  39.00-40.00  sec  11.9 MBytes  99.6 Mbits/sec  165    157
> >>>> KBytes
> >>>>>>>> [  5]  40.00-41.00  sec  11.9 MBytes  99.6 Mbits/sec    0    157
> >>>> KBytes
> >>>>>>>> [  5]  41.00-42.00  sec  12.0 MBytes   101 Mbits/sec    0    157
> >>>> KBytes
> >>>>>>>> [  5]  42.00-43.00  sec  11.9 MBytes  99.6 Mbits/sec    0    157
> >>>> KBytes
> >>>>>>>> [  5]  43.00-44.00  sec  12.0 MBytes   101 Mbits/sec  131    130
> >>>> KBytes
> >>>>>>>> [  5]  44.00-45.00  sec  11.9 MBytes  99.6 Mbits/sec    0    130
> >>>> KBytes
> >>>>>>>> [  5]  45.00-46.00  sec  11.9 MBytes  99.6 Mbits/sec    0    130
> >>>> KBytes
> >>>>>>>> [  5]  46.00-47.00  sec  12.0 MBytes   101 Mbits/sec    0    130
> >>>> KBytes
> >>>>>>>> [  5]  47.00-48.00  sec  11.9 MBytes  99.6 Mbits/sec   96    221
> >>>> KBytes
> >>>>>>>> [  5]  48.00-49.00  sec  11.9 MBytes  99.6 Mbits/sec    0    221
> >>>> KBytes
> >>>>>>>> [  5]  49.00-50.00  sec  12.0 MBytes   101 Mbits/sec    0    221
> >>>> KBytes
> >>>>>>>> [  5]  50.00-51.00  sec  11.9 MBytes  99.6 Mbits/sec    0    221
> >>>> KBytes
> >>>>>>>> [  5]  51.00-52.00  sec  12.0 MBytes   101 Mbits/sec    0    221
> >>>> KBytes
> >>>>>>>> [  5]  52.00-53.00  sec  11.9 MBytes  99.6 Mbits/sec    0    221
> >>>> KBytes
> >>>>>>>> [  5]  53.00-54.00  sec  11.9 MBytes  99.6 Mbits/sec  164    155
> >>>> KBytes
> >>>>>>>> [  5]  54.00-55.00  sec  12.0 MBytes   101 Mbits/sec    0    155
> >>>> KBytes
> >>>>>>>> [  5]  55.00-56.00  sec  11.9 MBytes  99.6 Mbits/sec    0    155
> >>>> KBytes
> >>>>>>>> [  5]  56.00-57.00  sec  11.9 MBytes  99.6 Mbits/sec    0    155
> >>>> KBytes
> >>>>>>>> [  5]  57.00-58.00  sec  12.0 MBytes   101 Mbits/sec    0    155
> >>>> KBytes
> >>>>>>>> [  5]  58.00-59.00  sec  11.9 MBytes  99.6 Mbits/sec    0    155
> >>>> KBytes
> >>>>>>>> [  5]  59.00-60.00  sec  11.9 MBytes  99.6 Mbits/sec    0    155
> >>>> KBytes
> >>>>>>>> [  5]  60.00-61.00  sec  12.0 MBytes   101 Mbits/sec  114    130
> >>>> KBytes
> >>>>>>>> [  5]  61.00-62.00  sec  11.9 MBytes  99.6 Mbits/sec    0    130
> >>>> KBytes
> >>>>>>>> [  5]  62.00-63.00  sec  12.0 MBytes   101 Mbits/sec    0    130
> >>>> KBytes
> >>>>>>>> [  5]  63.00-64.00  sec  11.9 MBytes  99.6 Mbits/sec    0    130
> >>>> KBytes
> >>>>>>>> [  5]  64.00-65.00  sec  11.9 MBytes  99.6 Mbits/sec    0    130
> >>>> KBytes
> >>>>>>>> [  5]  65.00-66.00  sec  12.0 MBytes   101 Mbits/sec   96    142
> >>>> KBytes
> >>>>>>>> [  5]  66.00-67.00  sec  11.9 MBytes  99.6 Mbits/sec    0    142
> >>>> KBytes
> >>>>>>>> [  5]  67.00-68.00  sec  11.9 MBytes  99.6 Mbits/sec    0    142
> >>>> KBytes
> >>>>>>>> [  5]  68.00-69.00  sec  12.0 MBytes   101 Mbits/sec    0    142
> >>>> KBytes
> >>>>>>>> [  5]  69.00-70.00  sec  11.9 MBytes  99.6 Mbits/sec    0    142
> >>>> KBytes
> >>>>>>>> [  5]  70.00-71.00  sec  12.0 MBytes   101 Mbits/sec    0    142
> >>>> KBytes
> >>>>>>>> [  5]  71.00-72.00  sec  11.9 MBytes  99.6 Mbits/sec    0    142
> >>>> KBytes
> >>>>>>>> [  5]  72.00-73.00  sec  11.9 MBytes  99.6 Mbits/sec  105    131
> >>>> KBytes
> >>>>>>>> [  5]  73.00-74.00  sec  12.0 MBytes   101 Mbits/sec    0    131
> >>>> KBytes
> >>>>>>>> [  5]  74.00-75.00  sec  11.9 MBytes  99.6 Mbits/sec    0    131
> >>>> KBytes
> >>>>>>>> [  5]  75.00-76.00  sec  11.9 MBytes  99.6 Mbits/sec    0    131
> >>>> KBytes
> >>>>>>>> [  5]  76.00-77.00  sec  12.0 MBytes   101 Mbits/sec    0    131
> >>>> KBytes
> >>>>>>>> [  5]  77.00-78.00  sec  11.9 MBytes  99.6 Mbits/sec    0    131
> >>>> KBytes
> >>>>>>>> [  5]  78.00-79.00  sec  11.9 MBytes  99.6 Mbits/sec   97    229
> >>>> KBytes
> >>>>>>>> [  5]  79.00-80.00  sec  12.0 MBytes   101 Mbits/sec    0    229
> >>>> KBytes
> >>>>>>>> [  5]  80.00-81.00  sec  11.9 MBytes  99.6 Mbits/sec    0    229
> >>>> KBytes
> >>>>>>>> [  5]  81.00-82.00  sec  12.0 MBytes   101 Mbits/sec    0    229
> >>>> KBytes
> >>>>>>>> [  5]  82.00-83.00  sec  11.9 MBytes  99.6 Mbits/sec    0    229
> >>>> KBytes
> >>>>>>>> [  5]  83.00-84.00  sec  11.9 MBytes  99.6 Mbits/sec    0    229
> >>>> KBytes
> >>>>>>>> [  5]  84.00-85.00  sec  12.0 MBytes   101 Mbits/sec    0    229
> >>>> KBytes
> >>>>>>>> [  5]  85.00-86.00  sec  11.9 MBytes  99.6 Mbits/sec  170    163
> >>>> KBytes
> >>>>>>>> [  5]  86.00-87.00  sec  11.9 MBytes  99.6 Mbits/sec    0    163
> >>>> KBytes
> >>>>>>>> [  5]  87.00-88.00  sec  12.0 MBytes   101 Mbits/sec    0    163
> >>>> KBytes
> >>>>>>>> [  5]  88.00-89.00  sec  11.9 MBytes  99.6 Mbits/sec    0    163
> >>>> KBytes
> >>>>>>>> [  5]  89.00-90.00  sec  11.9 MBytes  99.6 Mbits/sec    0    163
> >>>> KBytes
> >>>>>>>> [  5]  90.00-91.00  sec  12.0 MBytes   101 Mbits/sec    0    163
> >>>> KBytes
> >>>>>>>> [  5]  91.00-92.00  sec  11.9 MBytes  99.6 Mbits/sec    0    163
> >>>> KBytes
> >>>>>>>> [  5]  92.00-93.00  sec  12.0 MBytes   101 Mbits/sec  121    130
> >>>> KBytes
> >>>>>>>> [  5]  93.00-94.00  sec  11.9 MBytes  99.6 Mbits/sec    0    130
> >>>> KBytes
> >>>>>>>> [  5]  94.00-95.00  sec  11.9 MBytes  99.6 Mbits/sec    0    130
> >>>> KBytes
> >>>>>>>> [  5]  95.00-96.00  sec  12.0 MBytes   101 Mbits/sec    0    130
> >>>> KBytes
> >>>>>>>> [  5]  96.00-97.00  sec  11.9 MBytes  99.6 Mbits/sec    0    130
> >>>> KBytes
> >>>>>>>> [  5]  97.00-98.00  sec  11.9 MBytes  99.6 Mbits/sec    0    130
> >>>> KBytes
> >>>>>>>> [  5]  98.00-99.00  sec  12.0 MBytes   101 Mbits/sec    0    130
> >>>> KBytes
> >>>>>>>> [  5]  99.00-100.00 sec  11.9 MBytes  99.6 Mbits/sec    0    130
> >>>> KBytes
> >>>>>>>> [  5] 100.00-101.00 sec  12.0 MBytes   101 Mbits/sec    0    130
> >>>> KBytes
> >>>>>>>> [  5] 101.00-102.00 sec  11.9 MBytes  99.6 Mbits/sec   96    176
> >>>> KBytes
> >>>>>>>> [  5] 102.00-103.00 sec  11.9 MBytes  99.6 Mbits/sec    0    176
> >>>> KBytes
> >>>>>>>> [  5] 103.00-104.00 sec  12.0 MBytes   101 Mbits/sec    0    176
> >>>> KBytes
> >>>>>>>> [  5] 104.00-105.00 sec  11.9 MBytes  99.6 Mbits/sec    0    176
> >>>> KBytes
> >>>>>>>> [  5] 105.00-106.00 sec  11.9 MBytes  99.6 Mbits/sec    0    176
> >>>> KBytes
> >>>>>>>> [  5] 106.00-107.00 sec  12.0 MBytes   101 Mbits/sec    0    176
> >>>> KBytes
> >>>>>>>> [  5] 107.00-108.00 sec  11.9 MBytes  99.6 Mbits/sec    0    176
> >>>> KBytes
> >>>>>>>> [  5] 108.00-109.00 sec  11.9 MBytes  99.6 Mbits/sec    0    176
> >>>> KBytes
> >>>>>>>> [  5] 109.00-110.00 sec  12.0 MBytes   101 Mbits/sec    0    176
> >>>> KBytes
> >>>>>>>> [  5] 110.00-111.00 sec  11.9 MBytes  99.6 Mbits/sec  130    130
> >>>> KBytes
> >>>>>>>> [  5] 111.00-112.00 sec  12.0 MBytes   101 Mbits/sec    0    130
> >>>> KBytes
> >>>>>>>> [  5] 112.00-113.00 sec  11.9 MBytes  99.6 Mbits/sec    0    130
> >>>> KBytes
> >>>>>>>> [  5] 113.00-114.00 sec  11.9 MBytes  99.6 Mbits/sec    0    130
> >>>> KBytes
> >>>>>>>> [  5] 114.00-115.00 sec  12.0 MBytes   101 Mbits/sec    0    130
> >>>> KBytes
> >>>>>>>> [  5] 115.00-116.00 sec  11.9 MBytes  99.6 Mbits/sec    0    130
> >>>> KBytes
> >>>>>>>> [  5] 116.00-117.00 sec  11.9 MBytes  99.6 Mbits/sec    0    130
> >>>> KBytes
> >>>>>>>> [  5] 117.00-118.00 sec  12.0 MBytes   101 Mbits/sec    0    130
> >>>> KBytes
> >>>>>>>> [  5] 118.00-119.00 sec  11.9 MBytes  99.6 Mbits/sec    0    130
> >>>> KBytes
> >>>>>>>> [  5] 119.00-120.00 sec  12.0 MBytes   101 Mbits/sec   96    237
> >>>> KBytes
> >>>>>>>> - - - - - - - - - - - - - - - - - - - - - - - - -
> >>>>>>>> [ ID] Interval           Transfer     Bitrate         Retr
> >>>>>>>> [  5]   0.00-120.00 sec  1.40 GBytes   100 Mbits/sec  2397
> >>>>   sender
> >>>>>>>> [  5]   0.00-120.00 sec  1.40 GBytes   100 Mbits/sec
> >>>>  receiver
> >>>>>>>>
> >>>>>>>> So if you don't have anything against it I would upload v3 which
> >>>> will default to 0, meaning disabled.
> >>>>>>>
> >>>>>>> Thanks Ales for sharing the test result! Looking at the two tests,
> > the
> >>>> one with mac-binding removed periodically (for ext0) had occasional
> >>>> retransmissions and the window size couldn't reach to the peak, while
> > the
> >>>> other one without mac-binding deletion had no restrans and kept
window
> > size
> >>>> at the 290KB constantly. However, they end up with the same
throughput
> >>>> number, so maybe the disturbance was not significant enough to affect
> > the
> >>>> throughput for this comparison. I wonder if there are more obvious
> >>>> differences if tested with a higher bandwidth environment, e.g. with
> > 10G,
> >>>> 25G or even higher line rate. I will find some time to test this in
our
> >>>> data center environment.
> >>>>>>
> >>>>>>
> >>>>>> I actually tried it with the maximum that my computer can handle.
It
> >>>> was around 18G for both flows and the results were more or less the
> > same.
> >>>> The throughput was stable and there were some retransmissions,
> >>>>>> but overall the connection looked ok.
> >>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> On the other hand, as mentioned in an earlier reply, if the
> >>>> mac-binding deletions at relatively long intervals doesn't affect
> > overall
> >>>> performance, we shouldn't even need to check the idle_age of OVS
> > flows. It
> >>>> would simplify the implementation a lot by northd checking a
timestamp
> > and
> >>>> delete expired entries, without checking idle_age at all. Maintaining
> > the
> >>>> ownership of the mac-binding records and doing all the idle_age
checks
> >>>> doesn't seem to provide us any extra benefit, right? Please also see
my
> >>>> response to Dumitru's comment below.
> >>>>>>
> >>>>>>
> >>>>>> That is actually a good point, if we can prove through testing that
> >>>> removal of MAC binding does not affect flow through others, which
from
> > the
> >>>> results above seems to be the case, we can probably skip the whole
> >>>>>> ownership. Which would really reduce it to checking if something
went
> >>>> over the threshold in northd probably. I am planning to make a bigger
> > test
> >>>> with more iperf flows 100 in similar setup and also some tests of
> > latency
> >>>> to see how much that is affected by the MAC binding removal.
> >>>>>>
> >>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>>
> >>>>>>>> Thanks,
> >>>>>>>> Ales
> >>>>>>>>
> >>>>>>>> On Mon, Jun 27, 2022 at 5:53 PM Dumitru Ceara <[email protected]>
> >>>> wrote:
> >>>>>>>>>
> >>>>>>>>> On 6/24/22 22:56, Han Zhou wrote:
> >>>>>>>>>> On Fri, Jun 24, 2022 at 12:41 PM Numan Siddique <[email protected]
>
> >>>> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>> On Fri, Jun 24, 2022 at 11:49 AM Han Zhou <[email protected]>
> >>>> wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>> On Fri, Jun 24, 2022 at 1:11 AM Ales Musil <[email protected]
>
> >>>> wrote:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Hi Han,
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> after our discussion I did he suggested test and the
> >>>> throughput does
> >>>>>>>>>> not
> >>>>>>>>>>>> seem to be affected,
> >>>>>>>>>>>>> I did the test with aging set to 2 sec, and during the test
> >>>> period
> >>>>>>>>>> (360
> >>>>>>>>>>>> sec) the MAC binding was removed multiple times.
> >>>>>>>>>>>>> There were some dropped packets, but the traffic was
> >>>> maintained with
> >>>>>>>>>>>> minimal disturbance.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Thanks for sharing the result! I think different applications
> >>>> may react
> >>>>>>>>>> to
> >>>>>>>>>>>> this kind of disturbance differently. Some may be sensitive
to
> >>>> packet
> >>>>>>>>>> loss.
> >>>>>>>>>>>> In addition, I believe this would also incur megaflow cache
> >>>> miss and
> >>>>>>>>>>>> trigger OVS userspace processing in the middle of a flow.
> >>>>>>>>>>>> May I know the traffic pattern of your test? Did you measure
> >>>> with iperf
> >>>>>>>>>>>> during the test? Could share the numbers with v.s. without
the
> >>>> drops?
> >>>>>>>>>>>>
> >>>>>>>>>>>> On the other hand, if such random disturbance is not
considered
> >>>> harmful
> >>>>>>>>>> for
> >>>>>>>>>>>> some deployment, then I would also question the value of
doing
> >>>> all those
> >>>>>>>>>>>> OVS flow idle_age checkings on the *owner* chassis. There can
> >>>> be lots of
> >>>>>>>>>>>> chassis consuming the same mac-binding entry but we are now
> >>>> checking "at
> >>>>>>>>>>>> least one of them is not using the entry recently", which
> >>>> doesn't sound
> >>>>>>>>>> too
> >>>>>>>>>>>> different from just blindly expiring the entries without
> >>>> checking
> >>>>>>>>>> anything,
> >>>>>>>>>>>> and let it recreate if someone still needs it - if the
minimal
> >>>>>>>>>> disturbance
> >>>>>>>>>>>> is acceptable in such environment. ovn-northd can do this
> >>>> periodical
> >>>>>>>>>> check
> >>>>>>>>>>>> easily and clean the expired entries, correct?
> >>>>>>>>>>>>
> >>>>>>>>>>>> Thanks,
> >>>>>>>>>>>> Han
> >>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Thanks,
> >>>>>>>>>>>>> Ales
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> On Wed, Jun 22, 2022 at 9:51 AM Ales Musil <
[email protected]>
> >>>> wrote:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> On Wed, Jun 22, 2022 at 9:21 AM Han Zhou <[email protected]
>
> >>>> wrote:
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> On Fri, Jun 17, 2022 at 2:08 AM Ales Musil <
> >>>> [email protected]>
> >>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Add MAC binding aging mechanism, that
> >>>>>>>>>>>>>>>> should take care of stale MAC bindings.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> The mechanism works on "ownership" of the
> >>>>>>>>>>>>>>>> MAC binding row. The chassis that creates
> >>>>>>>>>>>>>>>> the row is then checking if the "idle_age"
> >>>>>>>>>>>>>>>> of the flow is over the aging threshold.
> >>>>>>>>>>>>>>>> In that case the MAC binding is removed
> >>>>>>>>>>>>>>>> from database. The "owner" might change
> >>>>>>>>>>>>>>>> when another chassis saw an update of the
> >>>>>>>>>>>>>>>> MAC address.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> This approach has downside, the chassis
> >>>>>>>>>>>>>>>> that "owns" the MAC binding might not actually be
> >>>>>>>>>>>>>>>> the one that is using it actively. This
> >>>>>>>>>>>>>>>> might lead some delays in packet flow when
> >>>>>>>>>>>>>>>> the row is removed.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Hi Han, thank you for your input.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Thanks Ales for working on this! The stale entries in
> >>>> MAC_Binding
> >>>>>>>>>> table
> >>>>>>>>>>>> was a big TODO of OVN and a difficult problem. It is great to
> >>>> see a
> >>>>>>>>>>>> solution finally, and I think utilizing the "idle_age" is
> >>>> brilliant.
> >>>>>>>>>> Before
> >>>>>>>>>>>> reviewing it in more detail, I'd like to discuss the
"downside"
> >>>> first.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> I think the "downside" here is indeed a problem with this
> >>>> approach.
> >>>>>>>>>> The
> >>>>>>>>>>>> MAC binding in OVN is in fact the ARP cache (or neighbour
> >>>> table) of the
> >>>>>>>>>>>> router, but OVN logical router is distributed (except for
> >>>> gateway-router
> >>>>>>>>>>>> and DGP), so in most cases by nature of OVN LR the user of
MAC
> >>>> binding
> >>>>>>>>>>>> wouldn't be the one "owns" it. It would be a big dataplane
> >>>> performance
> >>>>>>>>>>>> impact, thinking about a chassis that has a flow with high
> >>>> throughput of
> >>>>>>>>>>>> packets suddenly needs to pause and wait for ovn-controller
> >>>> (and SB DB)
> >>>>>>>>>> to
> >>>>>>>>>>>> complete the ARP resolution process. I saw this being pointed
> >>>> out and
> >>>>>>>>>>>> discussed in the first version, but I'd raise more attention
to
> >>>> it,
> >>>>>>>>>> because
> >>>>>>>>>>>> the problem introduced would be much bigger than the stale
> >>>> entries in
> >>>>>>>>>> the
> >>>>>>>>>>>> MAC binding table.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> I think the proposal from Daniel that transfers owner with
> >>>> "expire
> >>>>>>>>>>>> timestamp" set would help, but I am also thinking that since
> >>>> the logical
> >>>>>>>>>>>> router is distributed, it may be unreasonable to have an
owner
> >>>> at all.
> >>>>>>>>>> My
> >>>>>>>>>>>> suggestion is, instead of assigning "owner" for each entry, a
> >>>> central
> >>>>>>>>>>>> controller can just be responsible for checking if any
chassis
> >>>> still
> >>>>>>>>>> uses
> >>>>>>>>>>>> the entry and removing it when no one uses it anymore.
> >>>> Naturally the
> >>>>>>>>>>>> central controller can be hosted in ovn-northd. Here is the
> >>>> detailed
> >>>>>>>>>>>> algorithm I am thinking:
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> * when an entry is created (by any ovn-controller), an
> >>>>>>>>>> expire_timestamp
> >>>>>>>>>>>> is set (e.g. 10 min from now - can be configurable)
> >>>>>>>>>>>>>>> * Each ovn-controller: check the entries it uses and if
the
> >>>>>>>>>>>> expire_timestamp of the entry is past, but its own "idle_age"
> >>>> indicates
> >>>>>>>>>> the
> >>>>>>>>>>>> entry is still needed, it will update the SB DB entry with a
> > new
> >>>>>>>>>>>> expire_timestamp. Note: before updating the SB DB,
> >>>> ovn-controller needs
> >>>>>>>>>> a
> >>>>>>>>>>>> random delay, to avoid update storm to SB unnecessarily - in
> >>>> most cases
> >>>>>>>>>>>> only one ovn-controller would update/refresh the SB DB when
an
> >>>> entry is
> >>>>>>>>>>>> expired.
> >>>>>>>>>>>>>>> * ovn-northd periodically checks if there are entries with
> >>>>>>>>>>>> expire_timestamp past longer than 1 min (this is related to
the
> >>>> random
> >>>>>>>>>>>> delay of ovn-controller, may be configurable, too), it will
go
> >>>> ahead and
> >>>>>>>>>>>> delete the entry.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> What do you think?
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> This is actually pretty close to the first approach that
was
> >>>>>>>>>> suggested
> >>>>>>>>>>>> in the BZ [0] for this. However your suggestion would cause
> >>>> less SB
> >>>>>>>>>> traffic
> >>>>>>>>>>>> which is great. I would be still a bit worried that in case
of
> >>>> big
> >>>>>>>>>> setups
> >>>>>>>>>>>> there could be a lot of controllers trying to postpone the
> >>>> deletion of
> >>>>>>>>>> the
> >>>>>>>>>>>> particular MAC binding. We are running some scale tests with
> >>>> the v2
> >>>>>>>>>> patch
> >>>>>>>>>>>> set, so we should have some answers whether the downside is
> >>>> causing any
> >>>>>>>>>>>> visible troubles.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> I will definitely discuss this suggestion with the rest of
> >>>> the team.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> In addition, such a change may still be risky in large
scale
> >>>>>>>>>>>> environments, and I think it worth experimenting first with a
> >>>> knob to
> >>>>>>>>>>>> enable it (and disabled by default).
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> That would be in line with what Mark suggested, a special
> >>>> value that
> >>>>>>>>>>>> disables mac binding e.g. threshold=0, which could be the
> >>>> default.
> >>>>>>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> +1 for keeping this disabled by default for now.
> >>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> Thanks Ales for working on this.  I haven't reviewed the patch
> >>>> series.
> >>>>>>>>>>> Jst providing some comments and my 2 cent thoughts
> >>>>>>>>>>>
> >>>>>>>>>>> 1.  If it's possible I'd avoid querying OVS to get the flow
> >>>> stats and
> >>>>>>>>>>> determine if a mac binding entry is stale/expired or not.
> >>>>>>>>>>>     If there is no other way, then I'm fine with it,
> >>>>>>>>>>>
> >>>>>>>>>>> 2. Before taking that approach we can perhaps explore another
> >>>> way to
> >>>>>>>>>>> do it.   My initial thought is:
> >>>>>>>>>>>       -  Each mac binding is owned by one ovn-controller
> >>>>  (probably
> >>>>>>>>>>> the one which learnt it)
> >>>>>>>>>>>      -  And periodically, it will generate an arp request for
> > the
> >>>>>>>>>>> learnt IP of the mac binding entry.
> >>>>>>>>>
> >>>>>>>>> I think if we go with this approach it's probably desirable that
> >>>> these
> >>>>>>>>> periodic probes are unicast instead of regular broadcast ARP
> >>>> requests.
> >>>>>>>>> We also need that the CMS (or somehow automatically) provisions
a
> >>>> unique
> >>>>>>>>> per-chassis source MAC to be used for such packets.
> >>>>>>>>>
> >>>>>>>>>>>      -  If that mac binding is still intact, we will receive
an
> >>>> arp
> >>>>>>>>>>> response. And ovn-controller handling this arp response will
> >>>> mark that
> >>>>>>>>>>> this
> >>>>>>>>>>>         mac binding entry as still active.
> >>>>>>>>>>>    -   If no response, then this mac binding entry is deleted.
> >>>>>>>>>>>
> >>>>>>>>>>> I don't think this can be easy to implement as presently we
> > first
> >>>>>>>>>>> check if have already learnt the mac bindind entry or not
(using
> >>>> ovn
> >>>>>>>>>>> action lookup_arp/ lookup_nd)
> >>>>>>>>>>> When we receive the arp response from the mac binding ip, then
> > we
> >>>>>>>>>>> should still send the packet to ovn-controller even if
> >>>> lookup_arp/nd
> >>>>>>>>>>> returns success.
> >>>>>>>>>>>
> >>>>>>>>>>> What do you all think ?  Does this seem doable ?
> >>>>>>>>>>
> >>>>>>>>>> Thanks Numan. I think 2) is probably a good way to go. It is
> >>>> different from
> >>>>>>>>>> the idea of deleting the entries not being used, but instead
just
> >>>> deleting
> >>>>>>>>>> entries that are not valid any more. In theory it is possible
> >>>> that there
> >>>>>>>>>> will still be lots of valid but unused entries in the DB, but
in
> >>>> practice
> >>>>>>>>>> the number of alive end-points are usually limited, so valid
but
> >>>> unused
> >>>>>>>>>> entries shouldn't be harmful enough. There is no dataplane
> >>>> concerns with
> >>>>>>>>>> this approach, and the control plane cost also seems not
> >>>> significant, so I
> >>>>>>>>>> think it is something worth trying.
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> I think that in the end, if we want a solution that works for
all
> >>>> cases,
> >>>>>>>>> we probably need to implement both approaches.  In essence this
> >>>> seems to
> >>>>>>>>> correspond to implementing mechanisms (1) and (2) from "2.3.2.1
> >  ARP
> >>>>>>>>> Cache Validation" in RFC 1122:
> >>>>>>>>>
> >>>>>>>>> https://datatracker.ietf.org/doc/html/rfc1122#page-22
> >>>>>>>>>
> >>>>>>>>> Any of these is better than the current behavior so it shouldn't
> >>>> matter
> >>>>>>>>> too much which one we take first as long as there's no dataplane
> >>>> impact.
> >>>>>>>>>
> >>>>>>>
> >>>>>>> Thanks Dumitru for the reference. It seems none of the approaches
> >>>> mentioned in the RFC consider if the ARP entry is in use or not. (1)
is
> >>>> merely implementing a timeout for each entry (2) is to delete only if
> > the
> >>>> entry is not valid any more (what Numan suggested). I think we can
> > start
> >>>> with (1), with configurable timeout  (and 0 means never timeout, like
> > it is
> >>>> today), and (2) is a more advanced approach but also a more complex
> >>>> implementation - we can implement it if (1) is not sufficient for all
> > the
> >>>> use cases.
> >>>>>>
> >>>>>>
> >>>>>> In the 1) they are mentioning refresh when we observe ARP for the
> > same
> >>>> entry, for that we would probably require additional controller
action
> > that
> >>>> would just bump the timestamp or something like that.
> >>>>>> The 2) can be approached differently, a) Remove the entry
> > periodically
> >>>> or when it does not respond. b) Remove the entry when it does not
> > respond
> >>>> and it timed out.
> >>>>>> The a) does not have any added value to the overall process as it
> > would
> >>>> be removed nevertheless.
> >>>>>> The b) has the disadvantage that the MAC binding table would keep
> >>>> destinations that are still reachable, but might not be used at all.
> >>>>>>
> >>>>>> Anyway this approach is more up to discussion as you have written
> > when
> >>>> we find out that the first part does not prove to be efficient
enough.
> >>>>>>
> >>>>>> Thanks,
> >>>>>> Ales
> >>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> Thanks,
> >>>>>>> Han
> >>>>>>>
> >>>>>>>>> Ales, would it also be possible to test your implementation with
> >>>>>>>>> multiple traffic streams between VIFs (more than 2 MAC_Bindings
in
> >>>> use)
> >>>>>>>>> to make sure that openflow changes due to an expiring
MAC_Binding
> >>>> do not
> >>>>>>>>> affect unrelated sessions (due to datapath flow
> >>>> recalculation/eviction)?
> >>>>>>>>>
> >>>>>>>>> Thanks,
> >>>>>>>>> Dumitru
> >>>>>>>>>
> >>>>>>>>>> Thanks,
> >>>>>>>>>> Han
> >>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> Numan
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Thanks,
> >>>>>>>>>>>>>>> Han
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Thanks,
> >>>>>>>>>>>>>> Ales
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> [0] https://bugzilla.redhat.com/2084668#c2
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> The threshold can be configured in
> >>>>>>>>>>>>>>>> NB_global table with key "mac_binding_age_threshold"
> >>>>>>>>>>>>>>>> in seconds with default value being 60.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> The test case is present as separate patch of the series.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Add delay to ARP response processing to prevent
> >>>>>>>>>>>>>>>> race condition between multiple controllers
> >>>>>>>>>>>>>>>> that received the same ARP.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Ales Musil (6):
> >>>>>>>>>>>>>>>>   Add chassis column to MAC_Binding table
> >>>>>>>>>>>>>>>>   Add MAC binding aging mechanism
> >>>>>>>>>>>>>>>>   Add stopwatch for MAC binding aging
> >>>>>>>>>>>>>>>>   Allow the MAC binding age threshold to be configurable
> >>>>>>>>>>>>>>>>   ovn.at: Add test case covering the MAC binding aging
> >>>>>>>>>>>>>>>>   pinctrl.c: Add delay after ARP packet
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>  controller/automake.mk         |   4 +-
> >>>>>>>>>>>>>>>>  controller/mac-binding-aging.c | 241
> >>>>>>>>>>>> +++++++++++++++++++++++++++++++++
> >>>>>>>>>>>>>>>>  controller/mac-binding-aging.h |  32 +++++
> >>>>>>>>>>>>>>>>  controller/ovn-controller.c    |  32 +++++
> >>>>>>>>>>>>>>>>  controller/pinctrl.c           |  73 ++++++++--
> >>>>>>>>>>>>>>>>  northd/northd.c                |  12 ++
> >>>>>>>>>>>>>>>>  northd/ovn-northd.c            |   2 +-
> >>>>>>>>>>>>>>>>  ovn-nb.xml                     |   5 +
> >>>>>>>>>>>>>>>>  ovn-sb.ovsschema               |   6 +-
> >>>>>>>>>>>>>>>>  ovn-sb.xml                     |   5 +
> >>>>>>>>>>>>>>>>  tests/ovn.at                   | 212
> >>>>>>>>>> +++++++++++++++++++++++++++--
> >>>>>>>>>>>>>>>>  11 files changed, 595 insertions(+), 29 deletions(-)
> >>>>>>>>>>>>>>>>  create mode 100644 controller/mac-binding-aging.c
> >>>>>>>>>>>>>>>>  create mode 100644 controller/mac-binding-aging.h
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> --
> >>>>>>>>>>>>>>>> 2.35.3
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> _______________________________________________
> >>>>>>>>>>>>>>>> dev mailing list
> >>>>>>>>>>>>>>>> [email protected]
> >>>>>>>>>>>>>>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> --
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Ales Musil
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Senior Software Engineer - OVN Core
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Red Hat EMEA
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> [email protected]    IM: amusil
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> --
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Ales Musil
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Senior Software Engineer - OVN Core
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Red Hat EMEA
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> [email protected]    IM: amusil
> >>>>>>>>>>>> _______________________________________________
> >>>>>>>>>>>> dev mailing list
> >>>>>>>>>>>> [email protected]
> >>>>>>>>>>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> >>>>>>>>>>>>
> >>>>>>>>>> _______________________________________________
> >>>>>>>>>> dev mailing list
> >>>>>>>>>> [email protected]
> >>>>>>>>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> --
> >>>>>>>>
> >>>>>>>> Ales Musil
> >>>>>>>>
> >>>>>>>> Senior Software Engineer - OVN Core
> >>>>>>>>
> >>>>>>>> Red Hat EMEA
> >>>>>>>>
> >>>>>>>> [email protected]    IM: amusil
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> --
> >>>>>>
> >>>>>> Ales Musil
> >>>>>>
> >>>>>> Senior Software Engineer - OVN Core
> >>>>>>
> >>>>>> Red Hat EMEA
> >>>>>>
> >>>>>> [email protected]    IM: amusil
> >>>>>
> >>>>>
> >>>>>
> >>>>> --
> >>>>>
> >>>>> Ales Musil
> >>>>>
> >>>>> Senior Software Engineer - OVN Core
> >>>>>
> >>>>> Red Hat EMEA
> >>>>>
> >>>>> [email protected]    IM: amusil
> >>>>
> >>>
> >>>
> >>
> >
>
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to