On 2/10/25 10:13, Dumitru Ceara wrote:
> On 2/9/25 9:47 PM, Numan Siddique wrote:
>> On Sat, Feb 8, 2025 at 2:34 AM Han Zhou <[email protected]> wrote:
>>>
>>>
>>>
>>> On Fri, Feb 7, 2025 at 1:09 PM Ilya Maximets <[email protected]> wrote:
>>>>
>>>> On 2/7/25 21:45, Ilya Maximets wrote:
>>>>> On 2/7/25 15:01, Ilya Maximets wrote:
>>>>>> On 2/7/25 05:31, Numan Siddique wrote:
>>>>>>> On Thu, Feb 6, 2025 at 3:57 AM Ilya Maximets <[email protected]> wrote:
>>>>>>>>
>>>>>>>> This patch set introduces ability to directly connect switches,
>>>>>>>> including transit switches, in order to achieve higher total port count
>>>>>>>> and locality of changes in L2 topologies spread across multiple
>>>>>>>> availability zones.  And while tailored for this use case, the changes
>>>>>>>> do not impose any limitations and should allow for all kinds of other
>>>>>>>> different topologies.
>>>>>>>>
>>>>>>>> Amount of the logic code changes is relatively small, most of the diff
>>>>>>>> are new tests for the introduced functionality.
>>>>>>>>
>>>>>>>>
>>>>>>>> Version 2:
>>>>>>>>
>>>>>>>>   * Rebased on top of latest changes on the main branch.
>>>>>>>>   * Improved validation of the peer in northd. [Mark]
>>>>>>>>   * Added a test for a switch port with an address set. [Mark]
>>>>>>>>
>>>>>>>>
>>>>>>>> Ilya Maximets (2):
>>>>>>>>   northd: Add support for spine-leaf logical switch topology.
>>>>>>>>   ic: Add support for spine-leaf topology for transit switches.
>>>>>>>
>>>>>>> Hi Ilya,
>>>>>>>
>>>>>>> Thanks for adding this feature.  I applied both the patches to the
>>>>>>> main.  I had to do a minor rebase.
>>>>>>
>>>>>> Thanks, Numan and Mark!
>>>>>>
>>>>>>>
>>>>>>> I see one small issue  with the spine switch.  Since all the spine
>>>>>>> switch ports have unknown address,
>>>>>>> the packet will be cloned to N - 1 logical switches if N switches 
>>>>>>> connect to it.
>>>>>>>
>>>>>>> I think we should enable "fdb learn" in the spine switch with the below 
>>>>>>> changes
>>>>>>>
>>>>>>> -----------------------------------------------
>>>>>>> diff --git a/northd/northd.c b/northd/northd.c
>>>>>>> index 880112c3b9..e77936fbe9 100644
>>>>>>> --- a/northd/northd.c
>>>>>>> +++ b/northd/northd.c
>>>>>>> @@ -5682,6 +5682,7 @@ build_lswitch_learn_fdb_op(
>>>>>>>      ovs_assert(op->nbsp);
>>>>>>>
>>>>>>>      if (!op->n_ps_addrs && op->has_unknown && (!strcmp(op->nbsp->type, 
>>>>>>> "") ||
>>>>>>> +        !strcmp(op->nbsp->type, "switch") ||
>>>>>>>          (lsp_is_localnet(op->nbsp) && 
>>>>>>> localnet_can_learn_mac(op->nbsp)))) {
>>>>>>>          ds_clear(match);
>>>>>>>          ds_clear(actions);
>>>>>>> -----------------------------------------------
>>>>>>>
>>>>>>> What do you think ?  Any concerns or objections ?   If not, can you
>>>>>>> submit a follow up patch to enable this ?
>>>>>>
>>>>>> This seems reasonable.  I missed the part that we enable FDB 
>>>>>> automatically
>>>>>> for some ports with "unknown".  Will look into that and submit a fix.
>>>>>
>>>>> So, I gave this some more thoughts and I'm not sure if we actually need
>>>>> FDB for "switch" ports in a common case.  In a non-IC setup OVN actually
>>>>> knows the destination, if it's part of OVN network.  The whole processing
>>>>> of the spine switch will happen on the source node and only the egress
>>>>> pipeline of the leaf switch will be executed on the destination node.
>>>>> If we have actual VM ports with "unknown", then those will have the FDB
>>>>> enabled, but we should not need to consult FDB for the spine switch itself
>>>>> otherwise.  Does that make sense?
>>
>> You're right.  But enabling mac learning for "switch" ports does have
>> some benefits.
>>  By clone what I mean is that during the first packet upcall flow
>> translation, vswitchd will
>> clone the packet to all the leaf switch pipelines and yes, the final
>> datapath flow will
>> of course not clone the packets.    So it's not a big deal.  But I
>> think it's better
>> to enable fdb for switch ports.
>>
>>
>>>>
>>>> Thinking more about this again, there is still an issue if we have 
>>>> "unknown"
>>>> ports in leaf switches.  Since those switches are separate from the spine,
>>>> will broadcast in the spine and then every leaf that has "unknown" will 
>>>> also
>>>> broadcast within itself.  And so, "unknown"  VM ports will receive traffic
>>>> destined to ports on other leaf switches.  So, yes, we still need FDB for
>>>> this case.
>>
>> Ack.
>>
>>>>
>>>>>
>>>>> OTOH, what we may actually need is FDB learning for remote ports in IC
>>>>> setup.  In this case, OVN doesn't actually know the whole topology and
>>>>> can't figure out to which of the remote ports the packet should go, so it
>>>>> will broadcast.  The problem here, however, is that FDB stages are in the
>>>>> ingress pipeline, and they will not be executed on the remote node that
>>>>> actually needs them.  So, we may need to create FDB stages in the egress
>>>>> pipeline for packets coming from remote ports.  Seems like a generic
>>>>> issue with "unknown" ports on a transit switch.
>>>>>
>>>>> Mark, Numan, Dumitru, what do you think?
>>>>
>>>> But this issue also still stands as we need a way to learn MAC addresses
>>>> from packet arriving from a remote port, so we either need FDB in the 
>>>> egress
>>>> pipeline of the transit switch for remote ports, or we need an external
>>>> service like ovn-ic synchronizing FDB entries between AZs, which is not 
>>>> good.
>>>>
>>>
>>> Hi Ilya, I agree with you that synchronizing FDB entries between AZs is a 
>>> really bad idea. FDB in the egress pipeline (and learning the mac for 
>>> inport of the spine switch) sounds reasonable.
>>
>> I agree.  FDB in the egress pipeline sounds reasonable.
>>
> 
> +1
> 
> I think it's reasonable implementation for all "learning features" in IC
> mode.  We already do that (for the same reason) for IGMP/MLD:
> 
> https://github.com/ovn-org/ovn/commit/c6b20c9940ed41f028347f559b21a25fb7625eb4
> 
>> If a spine transit switch has N ports,  then each packet (both request
>> and reply) will get flooded to
>> all the remote ports of the spine transit switch.  I tested it and I
>> can confirm.
>>
>> If we can't add FDB in the egress pipeline in a reasonable time for
>> 25.03,  I'd suggest documenting this
>> flood behavior and mark this feature as experimental so that the users
>> of 25.03 are well informed of this.
>>
> 
> Sounds like something we can add after branching (either learning in the
> egress pipeline or marking the feature as experimental) indeed.

Thanks, all!  I think, the fixes should be relatively simple, I'll try
to post some patches patches today or tomorrow.

Best regards, Ilya Maximets.
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to