On Wed, Jul 13, 2022 at 1:49 AM Dumitru Ceara <[email protected]> wrote: > > On 7/13/22 08:45, Han Zhou wrote: > > On Tue, Jul 12, 2022 at 1:02 AM Dumitru Ceara <[email protected]> wrote: > >> > >> On 7/11/22 09:26, Ales Musil wrote: > >>> On Mon, Jul 11, 2022 at 8:51 AM Han Zhou <[email protected]> wrote: > >>> > >>>> > >>>> > >>>> > >>>> On Fri, Jul 1, 2022 at 3:19 AM Ales Musil <[email protected]> wrote: > >>>>> > >>>>> Hi, > >>>>> > >>>>> as promised I have more results from testing. The results are > > available > >>>> on the BZ comment 6 [0]. > >>>>> > >>>>> So about the scenario, there were two traffic types being sent. The > >>>> document has TCP latency and throughput, UPD throughput. > >>>>> The traffic was splitted into two halves, 625 connections (50 flows) > >>>> were left without touching the MAC binding table. The same amount > >>>>> (625 connections, 50 flows) was disrupted by periodical removal of MAC > >>>> bindings every 20 sec. > >>>>> > >>>>> IMO the results prove two points: > >>>>> 1) Removal of MAC binding does not seem to affect unrelated flows, > > which > >>>> was a huge concern. > >>>>> 2) There might be some added value in keeping the connection alive as > >>>> long as it is used. The UDP disrupted graph shows > >>>>> the the throughput was not able to catch up again after the deletion, > >>>> it's debatable that the ramp up time is probably longer than > >>>>> was the interval of removal and 20 s is not really sensible for > >>>> production. Anyway the connection check would prevent that, but only > > on the > >>>>> "owner" chassis. > >>>>> > >>>>> So now it's probably about deciding what compromise to make. Having > > the > >>>> owner with potential improvement about ownership transfer or > >>>>> having just a simple timeout that will remove anything that expired. > >>>>> > >>>>> Han, Dumitru, Numan > >>>>> please let me know what do you think about these results. > >>>>> > >>>> > >>>> Thanks Ales for the detailed test results. From the graphs you shared, > > it > >>>> does look like the impact is quite obvious, for both throughput and > >>>> latency, even for TCP, right? Look at the TCP throughput line, for > > about > >>>> 50% of the time it was below 2G, while the one without disruption was > > above > >>>> 10G for most of the time. > >>>> Did I interpret the graph correctly? Why is it different from your > >>>> observation earlier (is it because of the max bandwidth of the test > > env)? > >>>> Does it suggest that a simple timeout mechanism is not suitable? > >>>> > >>>> Thanks, > >>>> Han > >>>> > >>> > >>> > >>> Hi Han, > >>> yes you interpreted it correctly. I think that the difference is > > because of > >>> the very high bandwidth (within multiple flows) in this test and the > >>> removal was quite quick. So the traffic was not able to catch up again > > in > >>> time before it got disrupted again. Also Xena seems to wait a bit for > > the > >>> other related traffic before it continues the previous stream so that > > might > >>> have added a bit of jitter on its own. The simple timeout might still > > be a > >>> solution, but we would need to make sure of two things: > >>> a) The timeout is large enough so if there is high bandwidth traffic it > > can > >>> catch up again. > >> > > Sounds reasonable, provided that the feature is configurable and can be > > disabled for environments that are more sensitive to such disruptions. > > > > I would go a step forward and make it configurable per logical router. > Thinking of the ovn-k8s use case (but probably applicable to OpenStack > and other CMS), there's probably less impact if we enable the feature on > gateway routers compared to if we enable it on the distributed central > cluster router. > +1. Good point.
> >> I went back to the RFC as this seems like it would have significantly > >> forwarding impact if we unconditionally remove ARP entries: > >> > >> https://datatracker.ietf.org/doc/html/rfc1122#page-22 > >> > >> 2.3.2.1 ARP Cache Validation > >> [..] > >> > >> (1) Timeout -- Periodically time out cache entries, > >> even if they are in use. Note that this timeout > >> should be restarted when the cache entry is > >> "refreshed" (by observing the source fields, > >> regardless of target address, of an ARP broadcast > >> from the system in question). For proxy ARP > >> situations, the timeout needs to be on the order > >> of a minute. > >> > >> This mentions that an ARP entry's timeout should be restarted if the > >> entry is refreshed. That is, when the host for which we have the ARP > >> entry sent an ARP broadcast. > >> > >> I think that makes it less likely to remove the entry while IP traffic > >> is using it. > >> > >> In our case that might be too complex to implement > > > > I haven't thought through all details yet, but can we just handle ARP > > requests through some extra pipelines with slowpath actions which refresh > > the expire time? Maybe some ratelimit (meter) and some random delay is > > needed, but it seems to be structurally independent and clear. Could you > > point out the blockers that I might have missed? > > > > Sounds possible. We do have to be careful though because today ARP > requests are only forwarded inside the OVN switch pipeline on the ports > connected to routers that own the target IP of the ARP request. That > means the others routers connected to the logical switch won't process > these ARP requests and potentially miss to refresh timers. Today it is controlled by the "always_learn_from_arp_request" option. By default it is true, and all the requests are handled regardless of the target. Even when it is "false", if the documentation is still up-to-date, it is examined if there is already a mac-binding entry existed on the LRP for the sender that needs to be updated: false - If there is a MAC binding for that IP and the MAC is different, or, if TPA of ARP request belongs to any router port on this router, then update/add that MAC-IP binding. Otherwise, don’t update/add entries. So, for the timeout refresh, this needs to be changed: if there is a MAC binding for that IP, not only check if it needs to be updated, but also refresh a timestamp. Thanks, Han > > In any case, this is a detail we can probably iron out later. > > >> so maybe we should > >> consider going back to Numan's suggestion, essentially: > >> > >> > >> (2) Unicast Poll -- Actively poll the remote host by > >> periodically sending a point-to-point ARP Request > >> to it, and delete the entry if no ARP Reply is > >> received from N successive polls. Again, the > >> timeout should be on the order of a minute, and > >> typically N is 2. > >> > > > > This approach seems more complex to me than (1), because it requires the > > "owner" role, and it is not easy to manage, because the "owner" (in fact it > > is just an agent) may need to be transfered. > > However, of course there is a benefit of this approach: it would never have > > a case when an entry is needed but was deleted, thus ensures no dataplane > > impact. > > > > So, my suggestion is: > > step 0.5 - implement (1) without the refresh mechanism - just experimental > > step 1 - implement (1) with refresh mechanism > > step 2 - as a more advanced (but also more complex) implementation, if (1) > > is proved not sufficient > > > > Sounds like a good plan forward. > > Thanks, > Dumitru > > > Thanks, > > Han > > > >> Thoughts? > >> > >>> b) We need to prevent bulk removals of MAC bindings that were added at > >>> roughly the same time. So some sort of limitation of how many we can > > remove > >>> in single transaction or adding some random delay > >>> upon creation. Or we can actually do both because there is no harm if > > the > >>> MAC binding is removed later than the threshold. > >> > >> Ales, you posted > >> > > https://patchwork.ozlabs.org/project/ovn/patch/[email protected]/ > >> which should cover the addition part if i'm not wrong, but I agree with > >> Han's concern there about combining the mac binding aging with the > >> random addition delay. > >> > >> A random removal delay seems less risky at a first glance. > >> > >> Thanks, > >> Dumitru > >> > >>> > >>> Thanks, > >>> Ales > >>> > >>> > >>>> > >>>>> Regards, > >>>>> Ales > >>>>> > >>>>> [0] https://bugzilla.redhat.com/show_bug.cgi?id=2084668#c6 > >>>>> > >>>>> On Thu, Jun 30, 2022 at 7:32 AM Ales Musil <[email protected]> wrote: > >>>>>> > >>>>>> > >>>>>> > >>>>>> On Thu, Jun 30, 2022 at 6:58 AM Han Zhou <[email protected]> wrote: > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> On Mon, Jun 27, 2022 at 11:55 PM Ales Musil <[email protected]> > > wrote: > >>>>>>>> > >>>>>>>> Hi, > >>>>>>>> > >>>>>>>> so I did the suggested test. Setup was HIV1 - ext0 and vm0, HIV2 > >>>> ext1 and vm1 > >>>>>>>> > >>>>>>>> The networks were connected as follow: > >>>>>>>> - vm0 and vm1 on the same switch > >>>>>>>> - logical router connected with the "internal" and "external" > > switch > >>>>>>>> - "external" switch connected to ext0 and ext1 through localnet > >>>>>>>> > >>>>>>>> So the traffic was flowing: > >>>>>>>> vmX -- LR -- localnet -- extX > >>>>>>>> > >>>>>>>> The iperf was running between vm0 - ext1 and vm1 - ext0. > >>>>>>>> > >>>>>>>> I have removed the MAC binding for ext0 multiple times to see if it > >>>> affects the other traffic. > >>>>>>>> And it actually does not, which is great. > >>>>>>>> > >>>>>>>> iperf output from vm0 - ext1: > >>>>>>>> [ ID] Interval Transfer Bitrate Retr Cwnd > >>>>>>>> [ 5] 0.00-1.00 sec 12.0 MBytes 101 Mbits/sec 0 290 > >>>> KBytes > >>>>>>>> [ 5] 1.00-2.00 sec 11.9 MBytes 99.6 Mbits/sec 0 290 > >>>> KBytes > >>>>>>>> [ 5] 2.00-3.00 sec 12.0 MBytes 101 Mbits/sec 0 290 > >>>> KBytes > >>>>>>>> [ 5] 3.00-4.00 sec 11.9 MBytes 99.6 Mbits/sec 0 290 > >>>> KBytes > >>>>>>>> [ 5] 4.00-5.00 sec 11.9 MBytes 99.6 Mbits/sec 0 290 > >>>> KBytes > >>>>>>>> [ 5] 5.00-6.00 sec 12.0 MBytes 101 Mbits/sec 0 290 > >>>> KBytes > >>>>>>>> [ 5] 6.00-7.00 sec 11.9 MBytes 99.6 Mbits/sec 0 290 > >>>> KBytes > >>>>>>>> [ 5] 7.00-8.00 sec 11.9 MBytes 99.6 Mbits/sec 0 290 > >>>> KBytes > >>>>>>>> [ 5] 8.00-9.00 sec 12.0 MBytes 101 Mbits/sec 0 290 > >>>> KBytes > >>>>>>>> [ 5] 9.00-10.00 sec 11.9 MBytes 99.6 Mbits/sec 0 290 > >>>> KBytes > >>>>>>>> [ 5] 10.00-11.00 sec 11.9 MBytes 99.6 Mbits/sec 0 290 > >>>> KBytes > >>>>>>>> [ 5] 11.00-12.00 sec 12.0 MBytes 101 Mbits/sec 0 290 > >>>> KBytes > >>>>>>>> [ 5] 12.00-13.00 sec 11.9 MBytes 99.6 Mbits/sec 0 290 > >>>> KBytes > >>>>>>>> [ 5] 13.00-14.00 sec 12.0 MBytes 101 Mbits/sec 0 290 > >>>> KBytes > >>>>>>>> [ 5] 14.00-15.00 sec 11.9 MBytes 99.6 Mbits/sec 0 290 > >>>> KBytes > >>>>>>>> [ 5] 15.00-16.00 sec 11.9 MBytes 99.6 Mbits/sec 0 290 > >>>> KBytes > >>>>>>>> [ 5] 16.00-17.00 sec 12.0 MBytes 101 Mbits/sec 0 290 > >>>> KBytes > >>>>>>>> [ 5] 17.00-18.00 sec 11.9 MBytes 99.6 Mbits/sec 0 290 > >>>> KBytes > >>>>>>>> [ 5] 18.00-19.00 sec 11.9 MBytes 99.6 Mbits/sec 0 290 > >>>> KBytes > >>>>>>>> [ 5] 19.00-20.00 sec 12.0 MBytes 101 Mbits/sec 0 290 > >>>> KBytes > >>>>>>>> [ 5] 20.00-21.00 sec 11.9 MBytes 99.6 Mbits/sec 0 290 > >>>> KBytes > >>>>>>>> [ 5] 21.00-22.00 sec 11.9 MBytes 99.6 Mbits/sec 0 290 > >>>> KBytes > >>>>>>>> [ 5] 22.00-23.00 sec 12.0 MBytes 101 Mbits/sec 0 290 > >>>> KBytes > >>>>>>>> [ 5] 23.00-24.00 sec 11.9 MBytes 99.6 Mbits/sec 0 290 > >>>> KBytes > >>>>>>>> [ 5] 24.00-25.00 sec 12.0 MBytes 101 Mbits/sec 0 290 > >>>> KBytes > >>>>>>>> [ 5] 25.00-26.00 sec 11.9 MBytes 99.6 Mbits/sec 0 290 > >>>> KBytes > >>>>>>>> [ 5] 26.00-27.00 sec 11.9 MBytes 99.6 Mbits/sec 0 290 > >>>> KBytes > >>>>>>>> [ 5] 27.00-28.00 sec 12.0 MBytes 101 Mbits/sec 0 290 > >>>> KBytes > >>>>>>>> [ 5] 28.00-29.00 sec 11.9 MBytes 99.6 Mbits/sec 0 290 > >>>> KBytes > >>>>>>>> [ 5] 29.00-30.00 sec 11.9 MBytes 99.6 Mbits/sec 0 290 > >>>> KBytes > >>>>>>>> [ 5] 30.00-31.00 sec 12.0 MBytes 101 Mbits/sec 0 290 > >>>> KBytes > >>>>>>>> [ 5] 31.00-32.00 sec 11.9 MBytes 99.6 Mbits/sec 0 290 > >>>> KBytes > >>>>>>>> [ 5] 32.00-33.00 sec 12.0 MBytes 101 Mbits/sec 0 290 > >>>> KBytes > >>>>>>>> [ 5] 33.00-34.00 sec 11.9 MBytes 99.6 Mbits/sec 0 290 > >>>> KBytes > >>>>>>>> [ 5] 34.00-35.00 sec 11.9 MBytes 99.6 Mbits/sec 0 290 > >>>> KBytes > >>>>>>>> [ 5] 35.00-36.00 sec 12.0 MBytes 101 Mbits/sec 0 290 > >>>> KBytes > >>>>>>>> [ 5] 36.00-37.00 sec 11.9 MBytes 99.6 Mbits/sec 0 290 > >>>> KBytes > >>>>>>>> [ 5] 37.00-38.00 sec 11.9 MBytes 99.6 Mbits/sec 0 290 > >>>> KBytes > >>>>>>>> [ 5] 38.00-39.00 sec 12.0 MBytes 101 Mbits/sec 0 290 > >>>> KBytes > >>>>>>>> [ 5] 39.00-40.00 sec 11.9 MBytes 99.6 Mbits/sec 0 290 > >>>> KBytes > >>>>>>>> [ 5] 40.00-41.00 sec 11.9 MBytes 99.6 Mbits/sec 0 290 > >>>> KBytes > >>>>>>>> [ 5] 41.00-42.00 sec 12.0 MBytes 101 Mbits/sec 0 290 > >>>> KBytes > >>>>>>>> [ 5] 42.00-43.00 sec 11.9 MBytes 99.6 Mbits/sec 0 290 > >>>> KBytes > >>>>>>>> [ 5] 43.00-44.00 sec 12.0 MBytes 101 Mbits/sec 0 290 > >>>> KBytes > >>>>>>>> [ 5] 44.00-45.00 sec 11.9 MBytes 99.6 Mbits/sec 0 290 > >>>> KBytes > >>>>>>>> [ 5] 45.00-46.00 sec 11.9 MBytes 99.6 Mbits/sec 0 290 > >>>> KBytes > >>>>>>>> [ 5] 46.00-47.00 sec 12.0 MBytes 101 Mbits/sec 0 290 > >>>> KBytes > >>>>>>>> [ 5] 47.00-48.00 sec 11.9 MBytes 99.6 Mbits/sec 0 290 > >>>> KBytes > >>>>>>>> [ 5] 48.00-49.00 sec 11.9 MBytes 99.6 Mbits/sec 0 290 > >>>> KBytes > >>>>>>>> [ 5] 49.00-50.00 sec 12.0 MBytes 101 Mbits/sec 0 290 > >>>> KBytes > >>>>>>>> [ 5] 50.00-51.00 sec 11.9 MBytes 99.6 Mbits/sec 0 290 > >>>> KBytes > >>>>>>>> [ 5] 51.00-52.00 sec 12.0 MBytes 101 Mbits/sec 0 290 > >>>> KBytes > >>>>>>>> [ 5] 52.00-53.00 sec 11.9 MBytes 99.6 Mbits/sec 0 290 > >>>> KBytes > >>>>>>>> [ 5] 53.00-54.00 sec 11.9 MBytes 99.6 Mbits/sec 0 290 > >>>> KBytes > >>>>>>>> [ 5] 54.00-55.00 sec 12.0 MBytes 101 Mbits/sec 0 290 > >>>> KBytes > >>>>>>>> [ 5] 55.00-56.00 sec 11.9 MBytes 99.6 Mbits/sec 0 290 > >>>> KBytes > >>>>>>>> [ 5] 56.00-57.00 sec 11.9 MBytes 99.6 Mbits/sec 0 290 > >>>> KBytes > >>>>>>>> [ 5] 57.00-58.00 sec 12.0 MBytes 101 Mbits/sec 0 290 > >>>> KBytes > >>>>>>>> [ 5] 58.00-59.00 sec 11.9 MBytes 99.6 Mbits/sec 0 290 > >>>> KBytes > >>>>>>>> [ 5] 59.00-60.00 sec 11.9 MBytes 99.6 Mbits/sec 0 290 > >>>> KBytes > >>>>>>>> [ 5] 60.00-61.00 sec 12.0 MBytes 101 Mbits/sec 0 290 > >>>> KBytes > >>>>>>>> [ 5] 61.00-62.00 sec 11.9 MBytes 99.6 Mbits/sec 0 290 > >>>> KBytes > >>>>>>>> [ 5] 62.00-63.00 sec 12.0 MBytes 101 Mbits/sec 0 290 > >>>> KBytes > >>>>>>>> [ 5] 63.00-64.00 sec 11.9 MBytes 99.6 Mbits/sec 0 290 > >>>> KBytes > >>>>>>>> [ 5] 64.00-65.00 sec 11.9 MBytes 99.6 Mbits/sec 0 290 > >>>> KBytes > >>>>>>>> [ 5] 65.00-66.00 sec 12.0 MBytes 101 Mbits/sec 0 290 > >>>> KBytes > >>>>>>>> [ 5] 66.00-67.00 sec 11.9 MBytes 99.6 Mbits/sec 0 290 > >>>> KBytes > >>>>>>>> [ 5] 67.00-68.00 sec 11.9 MBytes 99.6 Mbits/sec 0 290 > >>>> KBytes > >>>>>>>> [ 5] 68.00-69.00 sec 12.0 MBytes 101 Mbits/sec 0 290 > >>>> KBytes > >>>>>>>> [ 5] 69.00-70.00 sec 11.9 MBytes 99.6 Mbits/sec 0 290 > >>>> KBytes > >>>>>>>> [ 5] 70.00-71.00 sec 12.0 MBytes 101 Mbits/sec 0 290 > >>>> KBytes > >>>>>>>> [ 5] 71.00-72.00 sec 11.9 MBytes 99.6 Mbits/sec 0 290 > >>>> KBytes > >>>>>>>> [ 5] 72.00-73.00 sec 11.9 MBytes 99.6 Mbits/sec 0 290 > >>>> KBytes > >>>>>>>> [ 5] 73.00-74.00 sec 12.0 MBytes 101 Mbits/sec 0 290 > >>>> KBytes > >>>>>>>> [ 5] 74.00-75.00 sec 11.9 MBytes 99.6 Mbits/sec 0 290 > >>>> KBytes > >>>>>>>> [ 5] 75.00-76.00 sec 11.9 MBytes 99.6 Mbits/sec 0 290 > >>>> KBytes > >>>>>>>> [ 5] 76.00-77.00 sec 12.0 MBytes 101 Mbits/sec 0 290 > >>>> KBytes > >>>>>>>> [ 5] 77.00-78.00 sec 11.9 MBytes 99.6 Mbits/sec 0 290 > >>>> KBytes > >>>>>>>> [ 5] 78.00-79.00 sec 11.9 MBytes 99.6 Mbits/sec 0 290 > >>>> KBytes > >>>>>>>> [ 5] 79.00-80.00 sec 12.0 MBytes 101 Mbits/sec 0 290 > >>>> KBytes > >>>>>>>> [ 5] 80.00-81.00 sec 11.9 MBytes 99.6 Mbits/sec 0 290 > >>>> KBytes > >>>>>>>> [ 5] 81.00-82.00 sec 12.0 MBytes 101 Mbits/sec 0 290 > >>>> KBytes > >>>>>>>> [ 5] 82.00-83.00 sec 11.9 MBytes 99.6 Mbits/sec 0 290 > >>>> KBytes > >>>>>>>> [ 5] 83.00-84.00 sec 11.9 MBytes 99.6 Mbits/sec 0 290 > >>>> KBytes > >>>>>>>> [ 5] 84.00-85.00 sec 12.0 MBytes 101 Mbits/sec 0 290 > >>>> KBytes > >>>>>>>> [ 5] 85.00-86.00 sec 11.9 MBytes 99.6 Mbits/sec 0 290 > >>>> KBytes > >>>>>>>> [ 5] 86.00-87.00 sec 11.9 MBytes 99.6 Mbits/sec 0 290 > >>>> KBytes > >>>>>>>> [ 5] 87.00-88.00 sec 12.0 MBytes 101 Mbits/sec 0 290 > >>>> KBytes > >>>>>>>> [ 5] 88.00-89.00 sec 11.9 MBytes 99.6 Mbits/sec 0 290 > >>>> KBytes > >>>>>>>> [ 5] 89.00-90.00 sec 11.9 MBytes 99.6 Mbits/sec 0 290 > >>>> KBytes > >>>>>>>> [ 5] 90.00-91.00 sec 12.0 MBytes 101 Mbits/sec 0 290 > >>>> KBytes > >>>>>>>> [ 5] 91.00-92.00 sec 11.9 MBytes 99.6 Mbits/sec 0 290 > >>>> KBytes > >>>>>>>> [ 5] 92.00-93.00 sec 12.0 MBytes 101 Mbits/sec 0 290 > >>>> KBytes > >>>>>>>> [ 5] 93.00-94.00 sec 11.9 MBytes 99.6 Mbits/sec 0 290 > >>>> KBytes > >>>>>>>> [ 5] 94.00-95.00 sec 11.9 MBytes 99.6 Mbits/sec 0 290 > >>>> KBytes > >>>>>>>> [ 5] 95.00-96.00 sec 12.0 MBytes 101 Mbits/sec 0 290 > >>>> KBytes > >>>>>>>> [ 5] 96.00-97.00 sec 11.9 MBytes 99.6 Mbits/sec 0 290 > >>>> KBytes > >>>>>>>> [ 5] 97.00-98.00 sec 11.9 MBytes 99.6 Mbits/sec 0 290 > >>>> KBytes > >>>>>>>> [ 5] 98.00-99.00 sec 12.0 MBytes 101 Mbits/sec 0 290 > >>>> KBytes > >>>>>>>> [ 5] 99.00-100.00 sec 11.9 MBytes 99.6 Mbits/sec 0 290 > >>>> KBytes > >>>>>>>> [ 5] 100.00-101.00 sec 12.0 MBytes 101 Mbits/sec 0 290 > >>>> KBytes > >>>>>>>> [ 5] 101.00-102.00 sec 11.9 MBytes 99.6 Mbits/sec 0 290 > >>>> KBytes > >>>>>>>> [ 5] 102.00-103.00 sec 11.9 MBytes 99.6 Mbits/sec 0 290 > >>>> KBytes > >>>>>>>> [ 5] 103.00-104.00 sec 12.0 MBytes 101 Mbits/sec 0 290 > >>>> KBytes > >>>>>>>> [ 5] 104.00-105.00 sec 11.9 MBytes 99.6 Mbits/sec 0 290 > >>>> KBytes > >>>>>>>> [ 5] 105.00-106.00 sec 11.9 MBytes 99.6 Mbits/sec 0 290 > >>>> KBytes > >>>>>>>> [ 5] 106.00-107.00 sec 12.0 MBytes 101 Mbits/sec 0 290 > >>>> KBytes > >>>>>>>> [ 5] 107.00-108.00 sec 11.9 MBytes 99.6 Mbits/sec 0 290 > >>>> KBytes > >>>>>>>> [ 5] 108.00-109.00 sec 11.9 MBytes 99.6 Mbits/sec 0 290 > >>>> KBytes > >>>>>>>> [ 5] 109.00-110.00 sec 12.0 MBytes 101 Mbits/sec 0 290 > >>>> KBytes > >>>>>>>> [ 5] 110.00-111.00 sec 11.9 MBytes 99.6 Mbits/sec 0 290 > >>>> KBytes > >>>>>>>> [ 5] 111.00-112.00 sec 12.0 MBytes 101 Mbits/sec 0 290 > >>>> KBytes > >>>>>>>> [ 5] 112.00-113.00 sec 11.9 MBytes 99.6 Mbits/sec 0 290 > >>>> KBytes > >>>>>>>> [ 5] 113.00-114.00 sec 11.9 MBytes 99.6 Mbits/sec 0 290 > >>>> KBytes > >>>>>>>> [ 5] 114.00-115.00 sec 12.0 MBytes 101 Mbits/sec 0 290 > >>>> KBytes > >>>>>>>> [ 5] 115.00-116.00 sec 11.9 MBytes 99.6 Mbits/sec 0 290 > >>>> KBytes > >>>>>>>> [ 5] 116.00-117.00 sec 11.9 MBytes 99.6 Mbits/sec 0 290 > >>>> KBytes > >>>>>>>> [ 5] 117.00-118.00 sec 12.0 MBytes 101 Mbits/sec 0 290 > >>>> KBytes > >>>>>>>> [ 5] 118.00-119.00 sec 11.9 MBytes 99.6 Mbits/sec 0 290 > >>>> KBytes > >>>>>>>> [ 5] 119.00-120.00 sec 12.0 MBytes 101 Mbits/sec 0 290 > >>>> KBytes > >>>>>>>> - - - - - - - - - - - - - - - - - - - - - - - - - > >>>>>>>> [ ID] Interval Transfer Bitrate Retr > >>>>>>>> [ 5] 0.00-120.00 sec 1.40 GBytes 100 Mbits/sec 0 > >>>> sender > >>>>>>>> [ 5] 0.00-120.00 sec 1.40 GBytes 100 Mbits/sec > >>>> receiver > >>>>>>>> > >>>>>>>> iperf output from vm1 - ext0: > >>>>>>>> [ ID] Interval Transfer Bitrate Retr Cwnd > >>>>>>>> [ 5] 0.00-1.00 sec 12.0 MBytes 101 Mbits/sec 0 150 > >>>> KBytes > >>>>>>>> [ 5] 1.00-2.00 sec 11.9 MBytes 99.6 Mbits/sec 0 150 > >>>> KBytes > >>>>>>>> [ 5] 2.00-3.00 sec 12.0 MBytes 101 Mbits/sec 0 150 > >>>> KBytes > >>>>>>>> [ 5] 3.00-4.00 sec 11.9 MBytes 99.6 Mbits/sec 127 118 > >>>> KBytes > >>>>>>>> [ 5] 4.00-5.00 sec 11.9 MBytes 99.6 Mbits/sec 0 130 > >>>> KBytes > >>>>>>>> [ 5] 5.00-6.00 sec 12.0 MBytes 101 Mbits/sec 0 130 > >>>> KBytes > >>>>>>>> [ 5] 6.00-7.00 sec 11.9 MBytes 99.6 Mbits/sec 0 130 > >>>> KBytes > >>>>>>>> [ 5] 7.00-8.00 sec 11.9 MBytes 99.6 Mbits/sec 96 160 > >>>> KBytes > >>>>>>>> [ 5] 8.00-9.00 sec 12.0 MBytes 101 Mbits/sec 0 160 > >>>> KBytes > >>>>>>>> [ 5] 9.00-10.00 sec 11.9 MBytes 99.6 Mbits/sec 0 160 > >>>> KBytes > >>>>>>>> [ 5] 10.00-11.00 sec 11.9 MBytes 99.6 Mbits/sec 118 130 > >>>> KBytes > >>>>>>>> [ 5] 11.00-12.00 sec 12.0 MBytes 101 Mbits/sec 0 130 > >>>> KBytes > >>>>>>>> [ 5] 12.00-13.00 sec 11.9 MBytes 99.6 Mbits/sec 0 130 > >>>> KBytes > >>>>>>>> [ 5] 13.00-14.00 sec 11.5 MBytes 96.5 Mbits/sec 3 4.07 > >>>> KBytes > >>>>>>>> [ 5] 14.00-15.00 sec 12.4 MBytes 104 Mbits/sec 93 178 > >>>> KBytes > >>>>>>>> [ 5] 15.00-16.00 sec 11.9 MBytes 99.6 Mbits/sec 0 178 > >>>> KBytes > >>>>>>>> [ 5] 16.00-17.00 sec 12.0 MBytes 101 Mbits/sec 0 178 > >>>> KBytes > >>>>>>>> [ 5] 17.00-18.00 sec 11.9 MBytes 99.6 Mbits/sec 0 178 > >>>> KBytes > >>>>>>>> [ 5] 18.00-19.00 sec 11.9 MBytes 99.6 Mbits/sec 138 130 > >>>> KBytes > >>>>>>>> [ 5] 19.00-20.00 sec 12.0 MBytes 101 Mbits/sec 0 130 > >>>> KBytes > >>>>>>>> [ 5] 20.00-21.00 sec 11.9 MBytes 99.6 Mbits/sec 0 130 > >>>> KBytes > >>>>>>>> [ 5] 21.00-22.00 sec 12.0 MBytes 101 Mbits/sec 0 130 > >>>> KBytes > >>>>>>>> [ 5] 22.00-23.00 sec 11.9 MBytes 99.6 Mbits/sec 0 130 > >>>> KBytes > >>>>>>>> [ 5] 23.00-24.00 sec 11.9 MBytes 99.6 Mbits/sec 0 130 > >>>> KBytes > >>>>>>>> [ 5] 24.00-25.00 sec 12.0 MBytes 101 Mbits/sec 96 195 > >>>> KBytes > >>>>>>>> [ 5] 25.00-26.00 sec 11.9 MBytes 99.6 Mbits/sec 0 195 > >>>> KBytes > >>>>>>>> [ 5] 26.00-27.00 sec 11.9 MBytes 99.6 Mbits/sec 0 195 > >>>> KBytes > >>>>>>>> [ 5] 27.00-28.00 sec 12.0 MBytes 101 Mbits/sec 0 195 > >>>> KBytes > >>>>>>>> [ 5] 28.00-29.00 sec 11.9 MBytes 99.6 Mbits/sec 0 195 > >>>> KBytes > >>>>>>>> [ 5] 29.00-30.00 sec 11.9 MBytes 99.6 Mbits/sec 0 195 > >>>> KBytes > >>>>>>>> [ 5] 30.00-31.00 sec 12.0 MBytes 101 Mbits/sec 0 195 > >>>> KBytes > >>>>>>>> [ 5] 31.00-32.00 sec 11.9 MBytes 99.6 Mbits/sec 145 225 > >>>> KBytes > >>>>>>>> [ 5] 32.00-33.00 sec 12.0 MBytes 101 Mbits/sec 0 225 > >>>> KBytes > >>>>>>>> [ 5] 33.00-34.00 sec 11.9 MBytes 99.6 Mbits/sec 0 225 > >>>> KBytes > >>>>>>>> [ 5] 34.00-35.00 sec 11.9 MBytes 99.6 Mbits/sec 0 225 > >>>> KBytes > >>>>>>>> [ 5] 35.00-36.00 sec 12.0 MBytes 101 Mbits/sec 0 225 > >>>> KBytes > >>>>>>>> [ 5] 36.00-37.00 sec 11.9 MBytes 99.6 Mbits/sec 0 225 > >>>> KBytes > >>>>>>>> [ 5] 37.00-38.00 sec 11.9 MBytes 99.6 Mbits/sec 0 225 > >>>> KBytes > >>>>>>>> [ 5] 38.00-39.00 sec 12.0 MBytes 101 Mbits/sec 0 225 > >>>> KBytes > >>>>>>>> [ 5] 39.00-40.00 sec 11.9 MBytes 99.6 Mbits/sec 165 157 > >>>> KBytes > >>>>>>>> [ 5] 40.00-41.00 sec 11.9 MBytes 99.6 Mbits/sec 0 157 > >>>> KBytes > >>>>>>>> [ 5] 41.00-42.00 sec 12.0 MBytes 101 Mbits/sec 0 157 > >>>> KBytes > >>>>>>>> [ 5] 42.00-43.00 sec 11.9 MBytes 99.6 Mbits/sec 0 157 > >>>> KBytes > >>>>>>>> [ 5] 43.00-44.00 sec 12.0 MBytes 101 Mbits/sec 131 130 > >>>> KBytes > >>>>>>>> [ 5] 44.00-45.00 sec 11.9 MBytes 99.6 Mbits/sec 0 130 > >>>> KBytes > >>>>>>>> [ 5] 45.00-46.00 sec 11.9 MBytes 99.6 Mbits/sec 0 130 > >>>> KBytes > >>>>>>>> [ 5] 46.00-47.00 sec 12.0 MBytes 101 Mbits/sec 0 130 > >>>> KBytes > >>>>>>>> [ 5] 47.00-48.00 sec 11.9 MBytes 99.6 Mbits/sec 96 221 > >>>> KBytes > >>>>>>>> [ 5] 48.00-49.00 sec 11.9 MBytes 99.6 Mbits/sec 0 221 > >>>> KBytes > >>>>>>>> [ 5] 49.00-50.00 sec 12.0 MBytes 101 Mbits/sec 0 221 > >>>> KBytes > >>>>>>>> [ 5] 50.00-51.00 sec 11.9 MBytes 99.6 Mbits/sec 0 221 > >>>> KBytes > >>>>>>>> [ 5] 51.00-52.00 sec 12.0 MBytes 101 Mbits/sec 0 221 > >>>> KBytes > >>>>>>>> [ 5] 52.00-53.00 sec 11.9 MBytes 99.6 Mbits/sec 0 221 > >>>> KBytes > >>>>>>>> [ 5] 53.00-54.00 sec 11.9 MBytes 99.6 Mbits/sec 164 155 > >>>> KBytes > >>>>>>>> [ 5] 54.00-55.00 sec 12.0 MBytes 101 Mbits/sec 0 155 > >>>> KBytes > >>>>>>>> [ 5] 55.00-56.00 sec 11.9 MBytes 99.6 Mbits/sec 0 155 > >>>> KBytes > >>>>>>>> [ 5] 56.00-57.00 sec 11.9 MBytes 99.6 Mbits/sec 0 155 > >>>> KBytes > >>>>>>>> [ 5] 57.00-58.00 sec 12.0 MBytes 101 Mbits/sec 0 155 > >>>> KBytes > >>>>>>>> [ 5] 58.00-59.00 sec 11.9 MBytes 99.6 Mbits/sec 0 155 > >>>> KBytes > >>>>>>>> [ 5] 59.00-60.00 sec 11.9 MBytes 99.6 Mbits/sec 0 155 > >>>> KBytes > >>>>>>>> [ 5] 60.00-61.00 sec 12.0 MBytes 101 Mbits/sec 114 130 > >>>> KBytes > >>>>>>>> [ 5] 61.00-62.00 sec 11.9 MBytes 99.6 Mbits/sec 0 130 > >>>> KBytes > >>>>>>>> [ 5] 62.00-63.00 sec 12.0 MBytes 101 Mbits/sec 0 130 > >>>> KBytes > >>>>>>>> [ 5] 63.00-64.00 sec 11.9 MBytes 99.6 Mbits/sec 0 130 > >>>> KBytes > >>>>>>>> [ 5] 64.00-65.00 sec 11.9 MBytes 99.6 Mbits/sec 0 130 > >>>> KBytes > >>>>>>>> [ 5] 65.00-66.00 sec 12.0 MBytes 101 Mbits/sec 96 142 > >>>> KBytes > >>>>>>>> [ 5] 66.00-67.00 sec 11.9 MBytes 99.6 Mbits/sec 0 142 > >>>> KBytes > >>>>>>>> [ 5] 67.00-68.00 sec 11.9 MBytes 99.6 Mbits/sec 0 142 > >>>> KBytes > >>>>>>>> [ 5] 68.00-69.00 sec 12.0 MBytes 101 Mbits/sec 0 142 > >>>> KBytes > >>>>>>>> [ 5] 69.00-70.00 sec 11.9 MBytes 99.6 Mbits/sec 0 142 > >>>> KBytes > >>>>>>>> [ 5] 70.00-71.00 sec 12.0 MBytes 101 Mbits/sec 0 142 > >>>> KBytes > >>>>>>>> [ 5] 71.00-72.00 sec 11.9 MBytes 99.6 Mbits/sec 0 142 > >>>> KBytes > >>>>>>>> [ 5] 72.00-73.00 sec 11.9 MBytes 99.6 Mbits/sec 105 131 > >>>> KBytes > >>>>>>>> [ 5] 73.00-74.00 sec 12.0 MBytes 101 Mbits/sec 0 131 > >>>> KBytes > >>>>>>>> [ 5] 74.00-75.00 sec 11.9 MBytes 99.6 Mbits/sec 0 131 > >>>> KBytes > >>>>>>>> [ 5] 75.00-76.00 sec 11.9 MBytes 99.6 Mbits/sec 0 131 > >>>> KBytes > >>>>>>>> [ 5] 76.00-77.00 sec 12.0 MBytes 101 Mbits/sec 0 131 > >>>> KBytes > >>>>>>>> [ 5] 77.00-78.00 sec 11.9 MBytes 99.6 Mbits/sec 0 131 > >>>> KBytes > >>>>>>>> [ 5] 78.00-79.00 sec 11.9 MBytes 99.6 Mbits/sec 97 229 > >>>> KBytes > >>>>>>>> [ 5] 79.00-80.00 sec 12.0 MBytes 101 Mbits/sec 0 229 > >>>> KBytes > >>>>>>>> [ 5] 80.00-81.00 sec 11.9 MBytes 99.6 Mbits/sec 0 229 > >>>> KBytes > >>>>>>>> [ 5] 81.00-82.00 sec 12.0 MBytes 101 Mbits/sec 0 229 > >>>> KBytes > >>>>>>>> [ 5] 82.00-83.00 sec 11.9 MBytes 99.6 Mbits/sec 0 229 > >>>> KBytes > >>>>>>>> [ 5] 83.00-84.00 sec 11.9 MBytes 99.6 Mbits/sec 0 229 > >>>> KBytes > >>>>>>>> [ 5] 84.00-85.00 sec 12.0 MBytes 101 Mbits/sec 0 229 > >>>> KBytes > >>>>>>>> [ 5] 85.00-86.00 sec 11.9 MBytes 99.6 Mbits/sec 170 163 > >>>> KBytes > >>>>>>>> [ 5] 86.00-87.00 sec 11.9 MBytes 99.6 Mbits/sec 0 163 > >>>> KBytes > >>>>>>>> [ 5] 87.00-88.00 sec 12.0 MBytes 101 Mbits/sec 0 163 > >>>> KBytes > >>>>>>>> [ 5] 88.00-89.00 sec 11.9 MBytes 99.6 Mbits/sec 0 163 > >>>> KBytes > >>>>>>>> [ 5] 89.00-90.00 sec 11.9 MBytes 99.6 Mbits/sec 0 163 > >>>> KBytes > >>>>>>>> [ 5] 90.00-91.00 sec 12.0 MBytes 101 Mbits/sec 0 163 > >>>> KBytes > >>>>>>>> [ 5] 91.00-92.00 sec 11.9 MBytes 99.6 Mbits/sec 0 163 > >>>> KBytes > >>>>>>>> [ 5] 92.00-93.00 sec 12.0 MBytes 101 Mbits/sec 121 130 > >>>> KBytes > >>>>>>>> [ 5] 93.00-94.00 sec 11.9 MBytes 99.6 Mbits/sec 0 130 > >>>> KBytes > >>>>>>>> [ 5] 94.00-95.00 sec 11.9 MBytes 99.6 Mbits/sec 0 130 > >>>> KBytes > >>>>>>>> [ 5] 95.00-96.00 sec 12.0 MBytes 101 Mbits/sec 0 130 > >>>> KBytes > >>>>>>>> [ 5] 96.00-97.00 sec 11.9 MBytes 99.6 Mbits/sec 0 130 > >>>> KBytes > >>>>>>>> [ 5] 97.00-98.00 sec 11.9 MBytes 99.6 Mbits/sec 0 130 > >>>> KBytes > >>>>>>>> [ 5] 98.00-99.00 sec 12.0 MBytes 101 Mbits/sec 0 130 > >>>> KBytes > >>>>>>>> [ 5] 99.00-100.00 sec 11.9 MBytes 99.6 Mbits/sec 0 130 > >>>> KBytes > >>>>>>>> [ 5] 100.00-101.00 sec 12.0 MBytes 101 Mbits/sec 0 130 > >>>> KBytes > >>>>>>>> [ 5] 101.00-102.00 sec 11.9 MBytes 99.6 Mbits/sec 96 176 > >>>> KBytes > >>>>>>>> [ 5] 102.00-103.00 sec 11.9 MBytes 99.6 Mbits/sec 0 176 > >>>> KBytes > >>>>>>>> [ 5] 103.00-104.00 sec 12.0 MBytes 101 Mbits/sec 0 176 > >>>> KBytes > >>>>>>>> [ 5] 104.00-105.00 sec 11.9 MBytes 99.6 Mbits/sec 0 176 > >>>> KBytes > >>>>>>>> [ 5] 105.00-106.00 sec 11.9 MBytes 99.6 Mbits/sec 0 176 > >>>> KBytes > >>>>>>>> [ 5] 106.00-107.00 sec 12.0 MBytes 101 Mbits/sec 0 176 > >>>> KBytes > >>>>>>>> [ 5] 107.00-108.00 sec 11.9 MBytes 99.6 Mbits/sec 0 176 > >>>> KBytes > >>>>>>>> [ 5] 108.00-109.00 sec 11.9 MBytes 99.6 Mbits/sec 0 176 > >>>> KBytes > >>>>>>>> [ 5] 109.00-110.00 sec 12.0 MBytes 101 Mbits/sec 0 176 > >>>> KBytes > >>>>>>>> [ 5] 110.00-111.00 sec 11.9 MBytes 99.6 Mbits/sec 130 130 > >>>> KBytes > >>>>>>>> [ 5] 111.00-112.00 sec 12.0 MBytes 101 Mbits/sec 0 130 > >>>> KBytes > >>>>>>>> [ 5] 112.00-113.00 sec 11.9 MBytes 99.6 Mbits/sec 0 130 > >>>> KBytes > >>>>>>>> [ 5] 113.00-114.00 sec 11.9 MBytes 99.6 Mbits/sec 0 130 > >>>> KBytes > >>>>>>>> [ 5] 114.00-115.00 sec 12.0 MBytes 101 Mbits/sec 0 130 > >>>> KBytes > >>>>>>>> [ 5] 115.00-116.00 sec 11.9 MBytes 99.6 Mbits/sec 0 130 > >>>> KBytes > >>>>>>>> [ 5] 116.00-117.00 sec 11.9 MBytes 99.6 Mbits/sec 0 130 > >>>> KBytes > >>>>>>>> [ 5] 117.00-118.00 sec 12.0 MBytes 101 Mbits/sec 0 130 > >>>> KBytes > >>>>>>>> [ 5] 118.00-119.00 sec 11.9 MBytes 99.6 Mbits/sec 0 130 > >>>> KBytes > >>>>>>>> [ 5] 119.00-120.00 sec 12.0 MBytes 101 Mbits/sec 96 237 > >>>> KBytes > >>>>>>>> - - - - - - - - - - - - - - - - - - - - - - - - - > >>>>>>>> [ ID] Interval Transfer Bitrate Retr > >>>>>>>> [ 5] 0.00-120.00 sec 1.40 GBytes 100 Mbits/sec 2397 > >>>> sender > >>>>>>>> [ 5] 0.00-120.00 sec 1.40 GBytes 100 Mbits/sec > >>>> receiver > >>>>>>>> > >>>>>>>> So if you don't have anything against it I would upload v3 which > >>>> will default to 0, meaning disabled. > >>>>>>> > >>>>>>> Thanks Ales for sharing the test result! Looking at the two tests, > > the > >>>> one with mac-binding removed periodically (for ext0) had occasional > >>>> retransmissions and the window size couldn't reach to the peak, while > > the > >>>> other one without mac-binding deletion had no restrans and kept window > > size > >>>> at the 290KB constantly. However, they end up with the same throughput > >>>> number, so maybe the disturbance was not significant enough to affect > > the > >>>> throughput for this comparison. I wonder if there are more obvious > >>>> differences if tested with a higher bandwidth environment, e.g. with > > 10G, > >>>> 25G or even higher line rate. I will find some time to test this in our > >>>> data center environment. > >>>>>> > >>>>>> > >>>>>> I actually tried it with the maximum that my computer can handle. It > >>>> was around 18G for both flows and the results were more or less the > > same. > >>>> The throughput was stable and there were some retransmissions, > >>>>>> but overall the connection looked ok. > >>>>>> > >>>>>>> > >>>>>>> > >>>>>>> On the other hand, as mentioned in an earlier reply, if the > >>>> mac-binding deletions at relatively long intervals doesn't affect > > overall > >>>> performance, we shouldn't even need to check the idle_age of OVS > > flows. It > >>>> would simplify the implementation a lot by northd checking a timestamp > > and > >>>> delete expired entries, without checking idle_age at all. Maintaining > > the > >>>> ownership of the mac-binding records and doing all the idle_age checks > >>>> doesn't seem to provide us any extra benefit, right? Please also see my > >>>> response to Dumitru's comment below. > >>>>>> > >>>>>> > >>>>>> That is actually a good point, if we can prove through testing that > >>>> removal of MAC binding does not affect flow through others, which from > > the > >>>> results above seems to be the case, we can probably skip the whole > >>>>>> ownership. Which would really reduce it to checking if something went > >>>> over the threshold in northd probably. I am planning to make a bigger > > test > >>>> with more iperf flows 100 in similar setup and also some tests of > > latency > >>>> to see how much that is affected by the MAC binding removal. > >>>>>> > >>>>>> > >>>>>>> > >>>>>>> > >>>>>>>> > >>>>>>>> Thanks, > >>>>>>>> Ales > >>>>>>>> > >>>>>>>> On Mon, Jun 27, 2022 at 5:53 PM Dumitru Ceara <[email protected]> > >>>> wrote: > >>>>>>>>> > >>>>>>>>> On 6/24/22 22:56, Han Zhou wrote: > >>>>>>>>>> On Fri, Jun 24, 2022 at 12:41 PM Numan Siddique <[email protected] > > >>>> wrote: > >>>>>>>>>>> > >>>>>>>>>>> On Fri, Jun 24, 2022 at 11:49 AM Han Zhou <[email protected]> > >>>> wrote: > >>>>>>>>>>>> > >>>>>>>>>>>> On Fri, Jun 24, 2022 at 1:11 AM Ales Musil <[email protected] > > >>>> wrote: > >>>>>>>>>>>>> > >>>>>>>>>>>>> Hi Han, > >>>>>>>>>>>>> > >>>>>>>>>>>>> after our discussion I did he suggested test and the > >>>> throughput does > >>>>>>>>>> not > >>>>>>>>>>>> seem to be affected, > >>>>>>>>>>>>> I did the test with aging set to 2 sec, and during the test > >>>> period > >>>>>>>>>> (360 > >>>>>>>>>>>> sec) the MAC binding was removed multiple times. > >>>>>>>>>>>>> There were some dropped packets, but the traffic was > >>>> maintained with > >>>>>>>>>>>> minimal disturbance. > >>>>>>>>>>>> > >>>>>>>>>>>> Thanks for sharing the result! I think different applications > >>>> may react > >>>>>>>>>> to > >>>>>>>>>>>> this kind of disturbance differently. Some may be sensitive to > >>>> packet > >>>>>>>>>> loss. > >>>>>>>>>>>> In addition, I believe this would also incur megaflow cache > >>>> miss and > >>>>>>>>>>>> trigger OVS userspace processing in the middle of a flow. > >>>>>>>>>>>> May I know the traffic pattern of your test? Did you measure > >>>> with iperf > >>>>>>>>>>>> during the test? Could share the numbers with v.s. without the > >>>> drops? > >>>>>>>>>>>> > >>>>>>>>>>>> On the other hand, if such random disturbance is not considered > >>>> harmful > >>>>>>>>>> for > >>>>>>>>>>>> some deployment, then I would also question the value of doing > >>>> all those > >>>>>>>>>>>> OVS flow idle_age checkings on the *owner* chassis. There can > >>>> be lots of > >>>>>>>>>>>> chassis consuming the same mac-binding entry but we are now > >>>> checking "at > >>>>>>>>>>>> least one of them is not using the entry recently", which > >>>> doesn't sound > >>>>>>>>>> too > >>>>>>>>>>>> different from just blindly expiring the entries without > >>>> checking > >>>>>>>>>> anything, > >>>>>>>>>>>> and let it recreate if someone still needs it - if the minimal > >>>>>>>>>> disturbance > >>>>>>>>>>>> is acceptable in such environment. ovn-northd can do this > >>>> periodical > >>>>>>>>>> check > >>>>>>>>>>>> easily and clean the expired entries, correct? > >>>>>>>>>>>> > >>>>>>>>>>>> Thanks, > >>>>>>>>>>>> Han > >>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> Thanks, > >>>>>>>>>>>>> Ales > >>>>>>>>>>>>> > >>>>>>>>>>>>> On Wed, Jun 22, 2022 at 9:51 AM Ales Musil < [email protected]> > >>>> wrote: > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> On Wed, Jun 22, 2022 at 9:21 AM Han Zhou <[email protected] > > >>>> wrote: > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> On Fri, Jun 17, 2022 at 2:08 AM Ales Musil < > >>>> [email protected]> > >>>>>>>>>> wrote: > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> Add MAC binding aging mechanism, that > >>>>>>>>>>>>>>>> should take care of stale MAC bindings. > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> The mechanism works on "ownership" of the > >>>>>>>>>>>>>>>> MAC binding row. The chassis that creates > >>>>>>>>>>>>>>>> the row is then checking if the "idle_age" > >>>>>>>>>>>>>>>> of the flow is over the aging threshold. > >>>>>>>>>>>>>>>> In that case the MAC binding is removed > >>>>>>>>>>>>>>>> from database. The "owner" might change > >>>>>>>>>>>>>>>> when another chassis saw an update of the > >>>>>>>>>>>>>>>> MAC address. > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> This approach has downside, the chassis > >>>>>>>>>>>>>>>> that "owns" the MAC binding might not actually be > >>>>>>>>>>>>>>>> the one that is using it actively. This > >>>>>>>>>>>>>>>> might lead some delays in packet flow when > >>>>>>>>>>>>>>>> the row is removed. > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Hi Han, thank you for your input. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Thanks Ales for working on this! The stale entries in > >>>> MAC_Binding > >>>>>>>>>> table > >>>>>>>>>>>> was a big TODO of OVN and a difficult problem. It is great to > >>>> see a > >>>>>>>>>>>> solution finally, and I think utilizing the "idle_age" is > >>>> brilliant. > >>>>>>>>>> Before > >>>>>>>>>>>> reviewing it in more detail, I'd like to discuss the "downside" > >>>> first. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> I think the "downside" here is indeed a problem with this > >>>> approach. > >>>>>>>>>> The > >>>>>>>>>>>> MAC binding in OVN is in fact the ARP cache (or neighbour > >>>> table) of the > >>>>>>>>>>>> router, but OVN logical router is distributed (except for > >>>> gateway-router > >>>>>>>>>>>> and DGP), so in most cases by nature of OVN LR the user of MAC > >>>> binding > >>>>>>>>>>>> wouldn't be the one "owns" it. It would be a big dataplane > >>>> performance > >>>>>>>>>>>> impact, thinking about a chassis that has a flow with high > >>>> throughput of > >>>>>>>>>>>> packets suddenly needs to pause and wait for ovn-controller > >>>> (and SB DB) > >>>>>>>>>> to > >>>>>>>>>>>> complete the ARP resolution process. I saw this being pointed > >>>> out and > >>>>>>>>>>>> discussed in the first version, but I'd raise more attention to > >>>> it, > >>>>>>>>>> because > >>>>>>>>>>>> the problem introduced would be much bigger than the stale > >>>> entries in > >>>>>>>>>> the > >>>>>>>>>>>> MAC binding table. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> I think the proposal from Daniel that transfers owner with > >>>> "expire > >>>>>>>>>>>> timestamp" set would help, but I am also thinking that since > >>>> the logical > >>>>>>>>>>>> router is distributed, it may be unreasonable to have an owner > >>>> at all. > >>>>>>>>>> My > >>>>>>>>>>>> suggestion is, instead of assigning "owner" for each entry, a > >>>> central > >>>>>>>>>>>> controller can just be responsible for checking if any chassis > >>>> still > >>>>>>>>>> uses > >>>>>>>>>>>> the entry and removing it when no one uses it anymore. > >>>> Naturally the > >>>>>>>>>>>> central controller can be hosted in ovn-northd. Here is the > >>>> detailed > >>>>>>>>>>>> algorithm I am thinking: > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> * when an entry is created (by any ovn-controller), an > >>>>>>>>>> expire_timestamp > >>>>>>>>>>>> is set (e.g. 10 min from now - can be configurable) > >>>>>>>>>>>>>>> * Each ovn-controller: check the entries it uses and if the > >>>>>>>>>>>> expire_timestamp of the entry is past, but its own "idle_age" > >>>> indicates > >>>>>>>>>> the > >>>>>>>>>>>> entry is still needed, it will update the SB DB entry with a > > new > >>>>>>>>>>>> expire_timestamp. Note: before updating the SB DB, > >>>> ovn-controller needs > >>>>>>>>>> a > >>>>>>>>>>>> random delay, to avoid update storm to SB unnecessarily - in > >>>> most cases > >>>>>>>>>>>> only one ovn-controller would update/refresh the SB DB when an > >>>> entry is > >>>>>>>>>>>> expired. > >>>>>>>>>>>>>>> * ovn-northd periodically checks if there are entries with > >>>>>>>>>>>> expire_timestamp past longer than 1 min (this is related to the > >>>> random > >>>>>>>>>>>> delay of ovn-controller, may be configurable, too), it will go > >>>> ahead and > >>>>>>>>>>>> delete the entry. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> What do you think? > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> This is actually pretty close to the first approach that was > >>>>>>>>>> suggested > >>>>>>>>>>>> in the BZ [0] for this. However your suggestion would cause > >>>> less SB > >>>>>>>>>> traffic > >>>>>>>>>>>> which is great. I would be still a bit worried that in case of > >>>> big > >>>>>>>>>> setups > >>>>>>>>>>>> there could be a lot of controllers trying to postpone the > >>>> deletion of > >>>>>>>>>> the > >>>>>>>>>>>> particular MAC binding. We are running some scale tests with > >>>> the v2 > >>>>>>>>>> patch > >>>>>>>>>>>> set, so we should have some answers whether the downside is > >>>> causing any > >>>>>>>>>>>> visible troubles. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> I will definitely discuss this suggestion with the rest of > >>>> the team. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> In addition, such a change may still be risky in large scale > >>>>>>>>>>>> environments, and I think it worth experimenting first with a > >>>> knob to > >>>>>>>>>>>> enable it (and disabled by default). > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> That would be in line with what Mark suggested, a special > >>>> value that > >>>>>>>>>>>> disables mac binding e.g. threshold=0, which could be the > >>>> default. > >>>>>>>>>>>>>> > >>>>>>>>> > >>>>>>>>> +1 for keeping this disabled by default for now. > >>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> Thanks Ales for working on this. I haven't reviewed the patch > >>>> series. > >>>>>>>>>>> Jst providing some comments and my 2 cent thoughts > >>>>>>>>>>> > >>>>>>>>>>> 1. If it's possible I'd avoid querying OVS to get the flow > >>>> stats and > >>>>>>>>>>> determine if a mac binding entry is stale/expired or not. > >>>>>>>>>>> If there is no other way, then I'm fine with it, > >>>>>>>>>>> > >>>>>>>>>>> 2. Before taking that approach we can perhaps explore another > >>>> way to > >>>>>>>>>>> do it. My initial thought is: > >>>>>>>>>>> - Each mac binding is owned by one ovn-controller > >>>> (probably > >>>>>>>>>>> the one which learnt it) > >>>>>>>>>>> - And periodically, it will generate an arp request for > > the > >>>>>>>>>>> learnt IP of the mac binding entry. > >>>>>>>>> > >>>>>>>>> I think if we go with this approach it's probably desirable that > >>>> these > >>>>>>>>> periodic probes are unicast instead of regular broadcast ARP > >>>> requests. > >>>>>>>>> We also need that the CMS (or somehow automatically) provisions a > >>>> unique > >>>>>>>>> per-chassis source MAC to be used for such packets. > >>>>>>>>> > >>>>>>>>>>> - If that mac binding is still intact, we will receive an > >>>> arp > >>>>>>>>>>> response. And ovn-controller handling this arp response will > >>>> mark that > >>>>>>>>>>> this > >>>>>>>>>>> mac binding entry as still active. > >>>>>>>>>>> - If no response, then this mac binding entry is deleted. > >>>>>>>>>>> > >>>>>>>>>>> I don't think this can be easy to implement as presently we > > first > >>>>>>>>>>> check if have already learnt the mac bindind entry or not (using > >>>> ovn > >>>>>>>>>>> action lookup_arp/ lookup_nd) > >>>>>>>>>>> When we receive the arp response from the mac binding ip, then > > we > >>>>>>>>>>> should still send the packet to ovn-controller even if > >>>> lookup_arp/nd > >>>>>>>>>>> returns success. > >>>>>>>>>>> > >>>>>>>>>>> What do you all think ? Does this seem doable ? > >>>>>>>>>> > >>>>>>>>>> Thanks Numan. I think 2) is probably a good way to go. It is > >>>> different from > >>>>>>>>>> the idea of deleting the entries not being used, but instead just > >>>> deleting > >>>>>>>>>> entries that are not valid any more. In theory it is possible > >>>> that there > >>>>>>>>>> will still be lots of valid but unused entries in the DB, but in > >>>> practice > >>>>>>>>>> the number of alive end-points are usually limited, so valid but > >>>> unused > >>>>>>>>>> entries shouldn't be harmful enough. There is no dataplane > >>>> concerns with > >>>>>>>>>> this approach, and the control plane cost also seems not > >>>> significant, so I > >>>>>>>>>> think it is something worth trying. > >>>>>>>>>> > >>>>>>>>> > >>>>>>>>> I think that in the end, if we want a solution that works for all > >>>> cases, > >>>>>>>>> we probably need to implement both approaches. In essence this > >>>> seems to > >>>>>>>>> correspond to implementing mechanisms (1) and (2) from "2.3.2.1 > > ARP > >>>>>>>>> Cache Validation" in RFC 1122: > >>>>>>>>> > >>>>>>>>> https://datatracker.ietf.org/doc/html/rfc1122#page-22 > >>>>>>>>> > >>>>>>>>> Any of these is better than the current behavior so it shouldn't > >>>> matter > >>>>>>>>> too much which one we take first as long as there's no dataplane > >>>> impact. > >>>>>>>>> > >>>>>>> > >>>>>>> Thanks Dumitru for the reference. It seems none of the approaches > >>>> mentioned in the RFC consider if the ARP entry is in use or not. (1) is > >>>> merely implementing a timeout for each entry (2) is to delete only if > > the > >>>> entry is not valid any more (what Numan suggested). I think we can > > start > >>>> with (1), with configurable timeout (and 0 means never timeout, like > > it is > >>>> today), and (2) is a more advanced approach but also a more complex > >>>> implementation - we can implement it if (1) is not sufficient for all > > the > >>>> use cases. > >>>>>> > >>>>>> > >>>>>> In the 1) they are mentioning refresh when we observe ARP for the > > same > >>>> entry, for that we would probably require additional controller action > > that > >>>> would just bump the timestamp or something like that. > >>>>>> The 2) can be approached differently, a) Remove the entry > > periodically > >>>> or when it does not respond. b) Remove the entry when it does not > > respond > >>>> and it timed out. > >>>>>> The a) does not have any added value to the overall process as it > > would > >>>> be removed nevertheless. > >>>>>> The b) has the disadvantage that the MAC binding table would keep > >>>> destinations that are still reachable, but might not be used at all. > >>>>>> > >>>>>> Anyway this approach is more up to discussion as you have written > > when > >>>> we find out that the first part does not prove to be efficient enough. > >>>>>> > >>>>>> Thanks, > >>>>>> Ales > >>>>>> > >>>>>>> > >>>>>>> > >>>>>>> Thanks, > >>>>>>> Han > >>>>>>> > >>>>>>>>> Ales, would it also be possible to test your implementation with > >>>>>>>>> multiple traffic streams between VIFs (more than 2 MAC_Bindings in > >>>> use) > >>>>>>>>> to make sure that openflow changes due to an expiring MAC_Binding > >>>> do not > >>>>>>>>> affect unrelated sessions (due to datapath flow > >>>> recalculation/eviction)? > >>>>>>>>> > >>>>>>>>> Thanks, > >>>>>>>>> Dumitru > >>>>>>>>> > >>>>>>>>>> Thanks, > >>>>>>>>>> Han > >>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> Numan > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Thanks, > >>>>>>>>>>>>>>> Han > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Thanks, > >>>>>>>>>>>>>> Ales > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> [0] https://bugzilla.redhat.com/2084668#c2 > >>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> The threshold can be configured in > >>>>>>>>>>>>>>>> NB_global table with key "mac_binding_age_threshold" > >>>>>>>>>>>>>>>> in seconds with default value being 60. > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> The test case is present as separate patch of the series. > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> Add delay to ARP response processing to prevent > >>>>>>>>>>>>>>>> race condition between multiple controllers > >>>>>>>>>>>>>>>> that received the same ARP. > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> Ales Musil (6): > >>>>>>>>>>>>>>>> Add chassis column to MAC_Binding table > >>>>>>>>>>>>>>>> Add MAC binding aging mechanism > >>>>>>>>>>>>>>>> Add stopwatch for MAC binding aging > >>>>>>>>>>>>>>>> Allow the MAC binding age threshold to be configurable > >>>>>>>>>>>>>>>> ovn.at: Add test case covering the MAC binding aging > >>>>>>>>>>>>>>>> pinctrl.c: Add delay after ARP packet > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> controller/automake.mk | 4 +- > >>>>>>>>>>>>>>>> controller/mac-binding-aging.c | 241 > >>>>>>>>>>>> +++++++++++++++++++++++++++++++++ > >>>>>>>>>>>>>>>> controller/mac-binding-aging.h | 32 +++++ > >>>>>>>>>>>>>>>> controller/ovn-controller.c | 32 +++++ > >>>>>>>>>>>>>>>> controller/pinctrl.c | 73 ++++++++-- > >>>>>>>>>>>>>>>> northd/northd.c | 12 ++ > >>>>>>>>>>>>>>>> northd/ovn-northd.c | 2 +- > >>>>>>>>>>>>>>>> ovn-nb.xml | 5 + > >>>>>>>>>>>>>>>> ovn-sb.ovsschema | 6 +- > >>>>>>>>>>>>>>>> ovn-sb.xml | 5 + > >>>>>>>>>>>>>>>> tests/ovn.at | 212 > >>>>>>>>>> +++++++++++++++++++++++++++-- > >>>>>>>>>>>>>>>> 11 files changed, 595 insertions(+), 29 deletions(-) > >>>>>>>>>>>>>>>> create mode 100644 controller/mac-binding-aging.c > >>>>>>>>>>>>>>>> create mode 100644 controller/mac-binding-aging.h > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> -- > >>>>>>>>>>>>>>>> 2.35.3 > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> _______________________________________________ > >>>>>>>>>>>>>>>> dev mailing list > >>>>>>>>>>>>>>>> [email protected] > >>>>>>>>>>>>>>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> -- > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Ales Musil > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Senior Software Engineer - OVN Core > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Red Hat EMEA > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> [email protected] IM: amusil > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> -- > >>>>>>>>>>>>> > >>>>>>>>>>>>> Ales Musil > >>>>>>>>>>>>> > >>>>>>>>>>>>> Senior Software Engineer - OVN Core > >>>>>>>>>>>>> > >>>>>>>>>>>>> Red Hat EMEA > >>>>>>>>>>>>> > >>>>>>>>>>>>> [email protected] IM: amusil > >>>>>>>>>>>> _______________________________________________ > >>>>>>>>>>>> dev mailing list > >>>>>>>>>>>> [email protected] > >>>>>>>>>>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev > >>>>>>>>>>>> > >>>>>>>>>> _______________________________________________ > >>>>>>>>>> dev mailing list > >>>>>>>>>> [email protected] > >>>>>>>>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev > >>>>>>>>>> > >>>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> -- > >>>>>>>> > >>>>>>>> Ales Musil > >>>>>>>> > >>>>>>>> Senior Software Engineer - OVN Core > >>>>>>>> > >>>>>>>> Red Hat EMEA > >>>>>>>> > >>>>>>>> [email protected] IM: amusil > >>>>>> > >>>>>> > >>>>>> > >>>>>> -- > >>>>>> > >>>>>> Ales Musil > >>>>>> > >>>>>> Senior Software Engineer - OVN Core > >>>>>> > >>>>>> Red Hat EMEA > >>>>>> > >>>>>> [email protected] IM: amusil > >>>>> > >>>>> > >>>>> > >>>>> -- > >>>>> > >>>>> Ales Musil > >>>>> > >>>>> Senior Software Engineer - OVN Core > >>>>> > >>>>> Red Hat EMEA > >>>>> > >>>>> [email protected] IM: amusil > >>>> > >>> > >>> > >> > > > _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
