Re: [ovs-discuss] [OVN] logical flow explosion in lr_in_ip_input table for dnat_and_snat IPs

2020-06-15 Thread Han Zhou
Sorry Girish, I can't promise for now. I will see if I have time in the
next couple of weeks, but welcome anyone to volunteer on this if it is
urgent.

On Mon, Jun 15, 2020 at 10:56 AM Girish Moodalbail 
wrote:

> Hello Han,
>
> On Wed, Jun 3, 2020 at 9:39 PM Han Zhou  wrote:
>
>>
>>
>> On Wed, Jun 3, 2020 at 7:16 PM Girish Moodalbail 
>> wrote:
>>
>>> Hello all,
>>>
>>> While working on an extension, see the diagram below, to the existing
>>> OVN logical topology for the ovn-kubernetes project, I am seeing an
>>> explosion of the "Reply to ARP requests" logical flows in the
>>> `lr_in_ip_input` table for the distributed router (ovn_cluster_router)
>>> configured with gateway port (rtol-LS)
>>>
>>> internet
>>>-+-->
>>> |
>>> |
>>>   +--localnet-port-+
>>>   |LS  |
>>>   +-ltor-LS+
>>>|
>>>|
>>>  +-rtol-LS+
>>>  |   ovn_cluster_router   |
>>>  |  (Distributed Router)  |
>>>  +-rtos-ls0--rtos-ls1rtos-ls2-+
>>>   |  |  |
>>>   |  |  |
>>> +-+-+   ++--+ +-+-+
>>> |  LS0  |   |  LS1  | |  LS2  |
>>> +-+-+   +-+-+ +-+-+
>>>   |   | |
>>>   p0  p1p2
>>>  IA0 IA1   IA2
>>>  EA0 EA1   EA2
>>> (Node0)  (Node1)   (Node2)
>>>
>>> In the topology above, each of the three logical switch port has an
>>> internal address of IAx and an external address of EAx (dnat_and_snat IP).
>>> They are all bound to their respective nodes (Nodex). A packet from `p0`
>>> heading towards the internet will be SNAT'ed to EA0 on the local hypervisor
>>> and then sent out through the LS's localnet-port on that hypervisor.
>>> Basically, they are configured for distributed NATing.
>>>
>>> I am seeing interesting "Reply to ARP requests" flows for arp.tpa set to
>>> "EAX". Flows are like this:
>>>
>>> For EA0
>>> priority=90, match=(inport == "rtos-ls0" && arp.tpa == EA0 && arp.op ==
>>> 1), action=(/* ARP reply */)
>>> priority=90, match=(inport == "rtos-ls1" && arp.tpa == EA0 && arp.op ==
>>> 1), action=(/* ARP reply */)
>>> priority=90, match=(inport == "rtos-ls2" && arp.tpa == EA0 && arp.op ==
>>> 1), action=(/* ARP reply */)
>>>
>>> For EA1
>>> priority=90, match=(inport == "rtos-ls0" && arp.tpa == EA1 && arp.op ==
>>> 1), action=(/* ARP reply */)
>>> priority=90, match=(inport == "rtos-ls1" && arp.tpa == EA0 && arp.op ==
>>> 1), action=(/* ARP reply */)
>>> priority=90, match=(inport == "rtos-ls2" && arp.tpa == EA1 && arp.op ==
>>> 1), action=(/* ARP reply */)
>>>
>>> Similarly, for EA2.
>>>
>>> So, we have N * N "Reply to ARP requests" flows for N nodes each with 1
>>> dnat_and_snat ip.
>>> This is causing scale issues.
>>>
>>> If you look at the flows for `EA0`, i am confused as to why is it needed?
>>>
>>>1. When will one see an ARP request for the EA0 from any of the
>>>LS{0,1,2}'s logical switch port.
>>>2. If it is needed at all, can't we just remove the `inport` thing
>>>altogether since the flow is configured for every port of logical router
>>>port except for the distributed gateway port rtol-LS. For this port, we
>>>could add an higher priority rule with action set to `next`.
>>>3. Say, we don't need east-west NAT connectivity. Is there a way to
>>>make these ARPs be learnt dynamically, like we are doing for join and
>>>external logical switch (the other thread [1]).
>>>
>>> Regards,
>>> ~Girish
>>>
>>> [1]
>>> https://mail.openvswitch.org/pipermail/ovs-discuss/2020-May/049994.html
>>>
>>
>> In general, these flows should be per router instead of per router port,
>> since the nat addresses are not attached to any router port. For
>> distributed gateway ports, there will need per-port flows to match
>> is_chassis_resident(gateway-chassis). I think this can be handled by:
>> - priority X + 20 flows for each distributed gateway port with
>> is_chassis_resident(), reply ARP
>> - priority X + 10 flows for each distributed gateway port without
>> is_chassis_resident(), drop
>> - priority X flows for each router (no need to match inport), reply ARP
>>
>> This way, there are N * (2D + 1) flows per router. N = number of NAT IPs,
>> D = number of distributed gateway ports. This would optimize the above
>> scenario where there is only 1 distributed gateway port but many regular
>> router ports. Thoughts?
>>
>
> We went ahead and added support for this topology in ovn-kubernetes
> project in this commit
>
> https://github.com/ovn-org/ovn-kubernetes/commit/edb24e6a71142f2e835b67b29c11e1688c645683
>
>
> Han, was curious to know if the above fix is in your radar? Thanks.
>
> The number of 

[ovs-discuss] problems sending to OFPP_NORMAL

2020-06-15 Thread Luca Mancini
Does anyone know of a way to send dp_packets throught the OFPP_NORMAL port?
I do not have the corresponding struct xlate_ctx pointer because to my 
knowledge there is no way of storing it for use later on, so I can’t use the 
compose_output_action() function.

I tried using ofproto_dpif_send_packet(), it doesn’t work withOFPP_NORMAL but 
it does with all other ports connected to the given switch, but I’m trying to 
avoid flooding the packet.

Luca

___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] [OVN] logical flow explosion in lr_in_ip_input table for dnat_and_snat IPs

2020-06-15 Thread Girish Moodalbail
Hello Han,

On Wed, Jun 3, 2020 at 9:39 PM Han Zhou  wrote:

>
>
> On Wed, Jun 3, 2020 at 7:16 PM Girish Moodalbail 
> wrote:
>
>> Hello all,
>>
>> While working on an extension, see the diagram below, to the existing
>> OVN logical topology for the ovn-kubernetes project, I am seeing an
>> explosion of the "Reply to ARP requests" logical flows in the
>> `lr_in_ip_input` table for the distributed router (ovn_cluster_router)
>> configured with gateway port (rtol-LS)
>>
>> internet
>>-+-->
>> |
>> |
>>   +--localnet-port-+
>>   |LS  |
>>   +-ltor-LS+
>>|
>>|
>>  +-rtol-LS+
>>  |   ovn_cluster_router   |
>>  |  (Distributed Router)  |
>>  +-rtos-ls0--rtos-ls1rtos-ls2-+
>>   |  |  |
>>   |  |  |
>> +-+-+   ++--+ +-+-+
>> |  LS0  |   |  LS1  | |  LS2  |
>> +-+-+   +-+-+ +-+-+
>>   |   | |
>>   p0  p1p2
>>  IA0 IA1   IA2
>>  EA0 EA1   EA2
>> (Node0)  (Node1)   (Node2)
>>
>> In the topology above, each of the three logical switch port has an
>> internal address of IAx and an external address of EAx (dnat_and_snat IP).
>> They are all bound to their respective nodes (Nodex). A packet from `p0`
>> heading towards the internet will be SNAT'ed to EA0 on the local hypervisor
>> and then sent out through the LS's localnet-port on that hypervisor.
>> Basically, they are configured for distributed NATing.
>>
>> I am seeing interesting "Reply to ARP requests" flows for arp.tpa set to
>> "EAX". Flows are like this:
>>
>> For EA0
>> priority=90, match=(inport == "rtos-ls0" && arp.tpa == EA0 && arp.op ==
>> 1), action=(/* ARP reply */)
>> priority=90, match=(inport == "rtos-ls1" && arp.tpa == EA0 && arp.op ==
>> 1), action=(/* ARP reply */)
>> priority=90, match=(inport == "rtos-ls2" && arp.tpa == EA0 && arp.op ==
>> 1), action=(/* ARP reply */)
>>
>> For EA1
>> priority=90, match=(inport == "rtos-ls0" && arp.tpa == EA1 && arp.op ==
>> 1), action=(/* ARP reply */)
>> priority=90, match=(inport == "rtos-ls1" && arp.tpa == EA0 && arp.op ==
>> 1), action=(/* ARP reply */)
>> priority=90, match=(inport == "rtos-ls2" && arp.tpa == EA1 && arp.op ==
>> 1), action=(/* ARP reply */)
>>
>> Similarly, for EA2.
>>
>> So, we have N * N "Reply to ARP requests" flows for N nodes each with 1
>> dnat_and_snat ip.
>> This is causing scale issues.
>>
>> If you look at the flows for `EA0`, i am confused as to why is it needed?
>>
>>1. When will one see an ARP request for the EA0 from any of the
>>LS{0,1,2}'s logical switch port.
>>2. If it is needed at all, can't we just remove the `inport` thing
>>altogether since the flow is configured for every port of logical router
>>port except for the distributed gateway port rtol-LS. For this port, we
>>could add an higher priority rule with action set to `next`.
>>3. Say, we don't need east-west NAT connectivity. Is there a way to
>>make these ARPs be learnt dynamically, like we are doing for join and
>>external logical switch (the other thread [1]).
>>
>> Regards,
>> ~Girish
>>
>> [1]
>> https://mail.openvswitch.org/pipermail/ovs-discuss/2020-May/049994.html
>>
>
> In general, these flows should be per router instead of per router port,
> since the nat addresses are not attached to any router port. For
> distributed gateway ports, there will need per-port flows to match
> is_chassis_resident(gateway-chassis). I think this can be handled by:
> - priority X + 20 flows for each distributed gateway port with
> is_chassis_resident(), reply ARP
> - priority X + 10 flows for each distributed gateway port without
> is_chassis_resident(), drop
> - priority X flows for each router (no need to match inport), reply ARP
>
> This way, there are N * (2D + 1) flows per router. N = number of NAT IPs,
> D = number of distributed gateway ports. This would optimize the above
> scenario where there is only 1 distributed gateway port but many regular
> router ports. Thoughts?
>

We went ahead and added support for this topology in ovn-kubernetes project
in this commit
https://github.com/ovn-org/ovn-kubernetes/commit/edb24e6a71142f2e835b67b29c11e1688c645683


Han, was curious to know if the above fix is in your radar? Thanks.

The number of OpenFlow flows in each of the hypervisors is insanely high
and is consuming a lot of memory.

Regards,
~Girish





>
> Thanks,
> Han
>
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


[ovs-discuss] [OVN] running bfd on ecmp routes?

2020-06-15 Thread Tim Rozet
Hi All,
While looking into using ecmp routes for an OVN router I noticed there is
no support for BFD on these routes. Would it be possible to add this
capability? I would like the next hop to be removed from the openflow group
if BFD detection for that next hop goes down. My routes in this case would
be on a GR for N/S external next hop and not going across a tunnel as it
egresses.

Thanks,

Tim Rozet
Red Hat CTO Networking Team
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss