On Fri, May 26, 2023 at 7:58 AM Han Zhou <[email protected]> wrote: > > > On Thu, May 25, 2023 at 9:19 AM Ilya Maximets <[email protected]> wrote: > > > > On 5/25/23 14:08, Ales Musil via discuss wrote: > > > Hi, > > > > > > to improve the MAC binding aging mechanism we need a way to ensure > that rows which are still in use are preserved. This doesn't happen with > current implementation. > > > > > > I propose the following solution which should solve the issue, any > questions or comments are welcome. If there isn't anything major that would > block this approach I would start to implement it so it can be available on > 23.09. > > > > > > For the approach itself: > > > > > > Add "mac_cache_use" action into "lr_in_learn_neighbor" table (only the > flow that continues on known MAC binding): > > > match=(REGBIT_LOOKUP_NEIGHBOR_RESULT == 1 || > REGBIT_LOOKUP_NEIGHBOR_IP_RESULT == 0), action=(next;) -> > match=(REGBIT_LOOKUP_NEIGHBOR_RESULT == 1 || > REGBIT_LOOKUP_NEIGHBOR_IP_RESULT == 0), action=(mac_cache_use; next;) > > > > > > The "mac_cache_use" would translate to resubmit into separate table > with flows per MAC binding as follows: > > > match=(ip.src=<MB_IP>, eth.src=<MB_MAC>, datapath=<MB_Datapath>), > action=(drop;) > > It is possible that some workload has heavy traffic for ingress direction > only, such as some UDP streams, but not sending anything out for a long > interval. So I am not sure if using "src" only would be sufficient. >
Using table 67 (for the "dst" check) still has the black hole problem. I'm not sure if there is a universal solution that would satisfy UDP traffic that goes only one way, no matter which. Also the negative effect of re-ARPing on UDP traffic should be a lot smaller than connection protocols. > > > > > One concern here would be that it will likely cause a packet clone > > in the datapath just to immediately drop it. So, might have a > > noticeable performance impact. > > > +1. We need to be careful to avoid any dataplane performance impact, which > doesn't sound justified for the value. > We can test this out if there is a clone or not, if so we might need to include it as a regular table in the pipeline. > > > > > > > This should bump the statistics every time for the correct MAC > binding. In ovn-controller we could periodically dump the flows from this > table. the period would be set to MIN(mac_binding_age_threshold/2) from all > local datapaths. The dump would happen from a different thread with its own > rconn to prevent backlogging issues. The thread would receive mapped data > from I-P node that would keep track of mapping datapath -> cookies -> mac > bindings. This allows us to avoid constant lookups, but at the cost of > keeping track of all local MAC bindings. To save some computation time this > I-P could be relevant only for datapaths that actually have the threshold > set. > > > > > > If the "idle_age" of the particular flow is smaller than the datapath > "mac_binding_age_threshold" it means that it is still in use. To prevent a > lot of updates, if the traffic is still relevant on multiple controllers, > we would check if the timestamp is older than the "dump period"; if not we > don't have to update it, because someone else did. > > Thanks for trying to reduce the number of updates to SB DB, but I still > have some concerns for this. In theory, to prevent the records being > deleted while it is still used, at least one timestamp update is required > per threshold for each record. Even if we bundle the updates from each > node, assume that the workloads that own the IP/MAC of the mac_binding > records are distributed across 1000 nodes, and the aging threshold is 30s, > there will still be ~30 updates/s (if we can evenly distribute the updates > from different nodes). That's still a lot, which may keep the SB server and > all the ovn-controller busy just for these messages. If the aging threshold > is set to 300s (5min), it may look better: ~3updates/s, but this still > could contribute to the major part of the SB <-> ovn-controller messages, > e.g. in ovn-k8s deployment the cluster LR is distributed on all nodes so > all nodes would need to monitor all mac-binding timestamp updates related > to the cluster LR, which means all mac-binding updates from all nodes. In > reality the amount of messages may be doubled if we use the proposed > dump-and-check interval mac_binding_age_threshold/2. > Arguably the 30 second threshold is way too low. You are right that the double update might happen in some cases, that can still be mitigated by shifting the check of timestamp older than 3/4 of the threshold for example, that should still give us enough time to perform the update, but it will greatly reduce the chance of doing the update twice. One thing to keep in mind is that these updates would happen only for MAC bindings that have the aging enabled. For some parts of the cluster that are fixed enough, or the MAC binding is growing very slowly it might not make sense to enable it, if so it might be with a pretty big threshold e.g. 3000s. > > So, I'd evaluate both the dataplane and control plane cost before going > for formal implementation. > The control plane impact is really configuration dependent which is fine IMO. If we document the possible outcome I think it should be safe. I will for sure try to minimize it as much as possible by doing the mentioned checks/optimizations. > > Thanks, > Han > > > > > > > > > Also to "desync" the controllers there would be a random delay added > to the "dump period". > > > > > > All of this would be applicable to FDB aging as well. > > > > > > Does that sound reasonable? > > > Please let me know if you have any comments/suggestions. > > > > > > Thanks, > > > Ales > > > Thanks, Ales -- Ales Musil Senior Software Engineer - OVN Core Red Hat EMEA <https://www.redhat.com> [email protected] IM: amusil <https://red.ht/sig>
_______________________________________________ discuss mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
