On 2/9/25 9:47 PM, Numan Siddique wrote: > On Sat, Feb 8, 2025 at 2:34 AM Han Zhou <[email protected]> wrote: >> >> >> >> On Fri, Feb 7, 2025 at 1:09 PM Ilya Maximets <[email protected]> wrote: >>> >>> On 2/7/25 21:45, Ilya Maximets wrote: >>>> On 2/7/25 15:01, Ilya Maximets wrote: >>>>> On 2/7/25 05:31, Numan Siddique wrote: >>>>>> On Thu, Feb 6, 2025 at 3:57 AM Ilya Maximets <[email protected]> wrote: >>>>>>> >>>>>>> This patch set introduces ability to directly connect switches, >>>>>>> including transit switches, in order to achieve higher total port count >>>>>>> and locality of changes in L2 topologies spread across multiple >>>>>>> availability zones. And while tailored for this use case, the changes >>>>>>> do not impose any limitations and should allow for all kinds of other >>>>>>> different topologies. >>>>>>> >>>>>>> Amount of the logic code changes is relatively small, most of the diff >>>>>>> are new tests for the introduced functionality. >>>>>>> >>>>>>> >>>>>>> Version 2: >>>>>>> >>>>>>> * Rebased on top of latest changes on the main branch. >>>>>>> * Improved validation of the peer in northd. [Mark] >>>>>>> * Added a test for a switch port with an address set. [Mark] >>>>>>> >>>>>>> >>>>>>> Ilya Maximets (2): >>>>>>> northd: Add support for spine-leaf logical switch topology. >>>>>>> ic: Add support for spine-leaf topology for transit switches. >>>>>> >>>>>> Hi Ilya, >>>>>> >>>>>> Thanks for adding this feature. I applied both the patches to the >>>>>> main. I had to do a minor rebase. >>>>> >>>>> Thanks, Numan and Mark! >>>>> >>>>>> >>>>>> I see one small issue with the spine switch. Since all the spine >>>>>> switch ports have unknown address, >>>>>> the packet will be cloned to N - 1 logical switches if N switches >>>>>> connect to it. >>>>>> >>>>>> I think we should enable "fdb learn" in the spine switch with the below >>>>>> changes >>>>>> >>>>>> ----------------------------------------------- >>>>>> diff --git a/northd/northd.c b/northd/northd.c >>>>>> index 880112c3b9..e77936fbe9 100644 >>>>>> --- a/northd/northd.c >>>>>> +++ b/northd/northd.c >>>>>> @@ -5682,6 +5682,7 @@ build_lswitch_learn_fdb_op( >>>>>> ovs_assert(op->nbsp); >>>>>> >>>>>> if (!op->n_ps_addrs && op->has_unknown && (!strcmp(op->nbsp->type, >>>>>> "") || >>>>>> + !strcmp(op->nbsp->type, "switch") || >>>>>> (lsp_is_localnet(op->nbsp) && >>>>>> localnet_can_learn_mac(op->nbsp)))) { >>>>>> ds_clear(match); >>>>>> ds_clear(actions); >>>>>> ----------------------------------------------- >>>>>> >>>>>> What do you think ? Any concerns or objections ? If not, can you >>>>>> submit a follow up patch to enable this ? >>>>> >>>>> This seems reasonable. I missed the part that we enable FDB automatically >>>>> for some ports with "unknown". Will look into that and submit a fix. >>>> >>>> So, I gave this some more thoughts and I'm not sure if we actually need >>>> FDB for "switch" ports in a common case. In a non-IC setup OVN actually >>>> knows the destination, if it's part of OVN network. The whole processing >>>> of the spine switch will happen on the source node and only the egress >>>> pipeline of the leaf switch will be executed on the destination node. >>>> If we have actual VM ports with "unknown", then those will have the FDB >>>> enabled, but we should not need to consult FDB for the spine switch itself >>>> otherwise. Does that make sense? > > You're right. But enabling mac learning for "switch" ports does have > some benefits. > By clone what I mean is that during the first packet upcall flow > translation, vswitchd will > clone the packet to all the leaf switch pipelines and yes, the final > datapath flow will > of course not clone the packets. So it's not a big deal. But I > think it's better > to enable fdb for switch ports. > > >>> >>> Thinking more about this again, there is still an issue if we have "unknown" >>> ports in leaf switches. Since those switches are separate from the spine, >>> will broadcast in the spine and then every leaf that has "unknown" will also >>> broadcast within itself. And so, "unknown" VM ports will receive traffic >>> destined to ports on other leaf switches. So, yes, we still need FDB for >>> this case. > > Ack. > >>> >>>> >>>> OTOH, what we may actually need is FDB learning for remote ports in IC >>>> setup. In this case, OVN doesn't actually know the whole topology and >>>> can't figure out to which of the remote ports the packet should go, so it >>>> will broadcast. The problem here, however, is that FDB stages are in the >>>> ingress pipeline, and they will not be executed on the remote node that >>>> actually needs them. So, we may need to create FDB stages in the egress >>>> pipeline for packets coming from remote ports. Seems like a generic >>>> issue with "unknown" ports on a transit switch. >>>> >>>> Mark, Numan, Dumitru, what do you think? >>> >>> But this issue also still stands as we need a way to learn MAC addresses >>> from packet arriving from a remote port, so we either need FDB in the egress >>> pipeline of the transit switch for remote ports, or we need an external >>> service like ovn-ic synchronizing FDB entries between AZs, which is not >>> good. >>> >> >> Hi Ilya, I agree with you that synchronizing FDB entries between AZs is a >> really bad idea. FDB in the egress pipeline (and learning the mac for inport >> of the spine switch) sounds reasonable. > > I agree. FDB in the egress pipeline sounds reasonable. >
+1 I think it's reasonable implementation for all "learning features" in IC mode. We already do that (for the same reason) for IGMP/MLD: https://github.com/ovn-org/ovn/commit/c6b20c9940ed41f028347f559b21a25fb7625eb4 > If a spine transit switch has N ports, then each packet (both request > and reply) will get flooded to > all the remote ports of the spine transit switch. I tested it and I > can confirm. > > If we can't add FDB in the egress pipeline in a reasonable time for > 25.03, I'd suggest documenting this > flood behavior and mark this feature as experimental so that the users > of 25.03 are well informed of this. > Sounds like something we can add after branching (either learning in the egress pipeline or marking the feature as experimental) indeed. Thanks, Dumitru _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
