On 2/10/25 10:13, Dumitru Ceara wrote: > On 2/9/25 9:47 PM, Numan Siddique wrote: >> On Sat, Feb 8, 2025 at 2:34 AM Han Zhou <[email protected]> wrote: >>> >>> >>> >>> On Fri, Feb 7, 2025 at 1:09 PM Ilya Maximets <[email protected]> wrote: >>>> >>>> On 2/7/25 21:45, Ilya Maximets wrote: >>>>> On 2/7/25 15:01, Ilya Maximets wrote: >>>>>> On 2/7/25 05:31, Numan Siddique wrote: >>>>>>> On Thu, Feb 6, 2025 at 3:57 AM Ilya Maximets <[email protected]> wrote: >>>>>>>> >>>>>>>> This patch set introduces ability to directly connect switches, >>>>>>>> including transit switches, in order to achieve higher total port count >>>>>>>> and locality of changes in L2 topologies spread across multiple >>>>>>>> availability zones. And while tailored for this use case, the changes >>>>>>>> do not impose any limitations and should allow for all kinds of other >>>>>>>> different topologies. >>>>>>>> >>>>>>>> Amount of the logic code changes is relatively small, most of the diff >>>>>>>> are new tests for the introduced functionality. >>>>>>>> >>>>>>>> >>>>>>>> Version 2: >>>>>>>> >>>>>>>> * Rebased on top of latest changes on the main branch. >>>>>>>> * Improved validation of the peer in northd. [Mark] >>>>>>>> * Added a test for a switch port with an address set. [Mark] >>>>>>>> >>>>>>>> >>>>>>>> Ilya Maximets (2): >>>>>>>> northd: Add support for spine-leaf logical switch topology. >>>>>>>> ic: Add support for spine-leaf topology for transit switches. >>>>>>> >>>>>>> Hi Ilya, >>>>>>> >>>>>>> Thanks for adding this feature. I applied both the patches to the >>>>>>> main. I had to do a minor rebase. >>>>>> >>>>>> Thanks, Numan and Mark! >>>>>> >>>>>>> >>>>>>> I see one small issue with the spine switch. Since all the spine >>>>>>> switch ports have unknown address, >>>>>>> the packet will be cloned to N - 1 logical switches if N switches >>>>>>> connect to it. >>>>>>> >>>>>>> I think we should enable "fdb learn" in the spine switch with the below >>>>>>> changes >>>>>>> >>>>>>> ----------------------------------------------- >>>>>>> diff --git a/northd/northd.c b/northd/northd.c >>>>>>> index 880112c3b9..e77936fbe9 100644 >>>>>>> --- a/northd/northd.c >>>>>>> +++ b/northd/northd.c >>>>>>> @@ -5682,6 +5682,7 @@ build_lswitch_learn_fdb_op( >>>>>>> ovs_assert(op->nbsp); >>>>>>> >>>>>>> if (!op->n_ps_addrs && op->has_unknown && (!strcmp(op->nbsp->type, >>>>>>> "") || >>>>>>> + !strcmp(op->nbsp->type, "switch") || >>>>>>> (lsp_is_localnet(op->nbsp) && >>>>>>> localnet_can_learn_mac(op->nbsp)))) { >>>>>>> ds_clear(match); >>>>>>> ds_clear(actions); >>>>>>> ----------------------------------------------- >>>>>>> >>>>>>> What do you think ? Any concerns or objections ? If not, can you >>>>>>> submit a follow up patch to enable this ? >>>>>> >>>>>> This seems reasonable. I missed the part that we enable FDB >>>>>> automatically >>>>>> for some ports with "unknown". Will look into that and submit a fix. >>>>> >>>>> So, I gave this some more thoughts and I'm not sure if we actually need >>>>> FDB for "switch" ports in a common case. In a non-IC setup OVN actually >>>>> knows the destination, if it's part of OVN network. The whole processing >>>>> of the spine switch will happen on the source node and only the egress >>>>> pipeline of the leaf switch will be executed on the destination node. >>>>> If we have actual VM ports with "unknown", then those will have the FDB >>>>> enabled, but we should not need to consult FDB for the spine switch itself >>>>> otherwise. Does that make sense? >> >> You're right. But enabling mac learning for "switch" ports does have >> some benefits. >> By clone what I mean is that during the first packet upcall flow >> translation, vswitchd will >> clone the packet to all the leaf switch pipelines and yes, the final >> datapath flow will >> of course not clone the packets. So it's not a big deal. But I >> think it's better >> to enable fdb for switch ports. >> >> >>>> >>>> Thinking more about this again, there is still an issue if we have >>>> "unknown" >>>> ports in leaf switches. Since those switches are separate from the spine, >>>> will broadcast in the spine and then every leaf that has "unknown" will >>>> also >>>> broadcast within itself. And so, "unknown" VM ports will receive traffic >>>> destined to ports on other leaf switches. So, yes, we still need FDB for >>>> this case. >> >> Ack. >> >>>> >>>>> >>>>> OTOH, what we may actually need is FDB learning for remote ports in IC >>>>> setup. In this case, OVN doesn't actually know the whole topology and >>>>> can't figure out to which of the remote ports the packet should go, so it >>>>> will broadcast. The problem here, however, is that FDB stages are in the >>>>> ingress pipeline, and they will not be executed on the remote node that >>>>> actually needs them. So, we may need to create FDB stages in the egress >>>>> pipeline for packets coming from remote ports. Seems like a generic >>>>> issue with "unknown" ports on a transit switch. >>>>> >>>>> Mark, Numan, Dumitru, what do you think? >>>> >>>> But this issue also still stands as we need a way to learn MAC addresses >>>> from packet arriving from a remote port, so we either need FDB in the >>>> egress >>>> pipeline of the transit switch for remote ports, or we need an external >>>> service like ovn-ic synchronizing FDB entries between AZs, which is not >>>> good. >>>> >>> >>> Hi Ilya, I agree with you that synchronizing FDB entries between AZs is a >>> really bad idea. FDB in the egress pipeline (and learning the mac for >>> inport of the spine switch) sounds reasonable. >> >> I agree. FDB in the egress pipeline sounds reasonable. >> > > +1 > > I think it's reasonable implementation for all "learning features" in IC > mode. We already do that (for the same reason) for IGMP/MLD: > > https://github.com/ovn-org/ovn/commit/c6b20c9940ed41f028347f559b21a25fb7625eb4 > >> If a spine transit switch has N ports, then each packet (both request >> and reply) will get flooded to >> all the remote ports of the spine transit switch. I tested it and I >> can confirm. >> >> If we can't add FDB in the egress pipeline in a reasonable time for >> 25.03, I'd suggest documenting this >> flood behavior and mark this feature as experimental so that the users >> of 25.03 are well informed of this. >> > > Sounds like something we can add after branching (either learning in the > egress pipeline or marking the feature as experimental) indeed.
Thanks, all! I think, the fixes should be relatively simple, I'll try to post some patches patches today or tomorrow. Best regards, Ilya Maximets. _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
