Re: [ovs-dev] ovn: Handling of arp/nd learning on logical switches

2023-03-06 Thread Ilya Maximets
On 3/3/23 11:20, Olaf Seibert via dev wrote:
> Tangentially related to this: I was trying to detect when the problem with 
> the "too many resubmits" occurs, for alerting purposes.
> Some light examination of the source of ovs suggested that the coverage 
> counter "drop_action_too_many_resubmit" corresponds to the error 
> "XLATE_TOO_MANY_RESUBMITS".
> 
> So I saw this very recent log message in 
> /var/log/openvswitch/ovs-vswitchd.log (I censored some addresses):
> 
> 2023-03-03T10:06:51.453Z|00518|ofproto_dpif_xlate(handler34)|WARN|over 4096 
> resubmit actions on bridge br-int while processing 
> arp,in_port=1,vlan_tci=0x,dl_src=zz:zz:zz:zz:zz:zz,dl_dst=ff:ff:ff:ff:ff:ff,arp_spa=xxx.xxx.xxx.xxx,arp_tpa=yyy.yyy.yyy.yyy,arp_op=1,arp_sha=vv:vv:vv:vv:vv:vv,arp_tha=00:00:00:00:00:00
> 
> but the coverage counter seems to ignore this:
> 
> # ovs-appctl coverage/read-counter drop_action_too_many_resubmit
> 0
> 
> Is this a bug or am I looking at the wrong thing(s)?

Coverage counters are collected only for actions executed in userspace.
In your case, if XLATE_TOO_MANY_RESUBMITS happens, the 'drop' flow
will be installed into kernel datapath and the actual packet drops will
happen in the kernel.  Coverage counters can't count these.
So, drop_action_* counters are only useful for userspace datapath at
the moment.

Best rgards, Ilya Maximets.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] ovn: Handling of arp/nd learning on logical switches

2023-03-03 Thread Olaf Seibert via dev

Tangentially related to this: I was trying to detect when the problem with the "too 
many resubmits" occurs, for alerting purposes.
Some light examination of the source of ovs suggested that the coverage counter 
"drop_action_too_many_resubmit" corresponds to the error 
"XLATE_TOO_MANY_RESUBMITS".

So I saw this very recent log message in /var/log/openvswitch/ovs-vswitchd.log 
(I censored some addresses):

2023-03-03T10:06:51.453Z|00518|ofproto_dpif_xlate(handler34)|WARN|over 4096 
resubmit actions on bridge br-int while processing 
arp,in_port=1,vlan_tci=0x,dl_src=zz:zz:zz:zz:zz:zz,dl_dst=ff:ff:ff:ff:ff:ff,arp_spa=xxx.xxx.xxx.xxx,arp_tpa=yyy.yyy.yyy.yyy,arp_op=1,arp_sha=vv:vv:vv:vv:vv:vv,arp_tha=00:00:00:00:00:00

but the coverage counter seems to ignore this:

# ovs-appctl coverage/read-counter drop_action_too_many_resubmit
0

Is this a bug or am I looking at the wrong thing(s)?

Cheers,
-Olaf.

--
Olaf Seibert
Site Reliability Engineer

SysEleven GmbH
Boxhagener Straße 80
10245 Berlin

T +49 30 233 2012 0
F +49 30 616 7555 0


https://www.syseleven.de
https://www.linkedin.com/company/syseleven-gmbh/
https://www.twitter.com/SysEleven
https://www.syseleven.de/events/

Aktueller System-Status immer unter:
https://www.syseleven-status.net/

Firmensitz: Berlin
Registergericht: AG Berlin Charlottenburg, HRB 108571 Berlin
Geschäftsführer: Marc Korthaus, Jens Ihlenfeld, Andreas Hermann, Norbert Müller
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] ovn: Handling of arp/nd learning on logical switches

2023-03-03 Thread Felix Hüttner via dev
Hello everyone,

We had a discussion on ovs-discuss that we wanted to bring here [1]:

Assume a physical network connected to a OVN Logical_Switch and then multiple 
Logical_Routers like so:

++
| Logical Router |
| lr001  +-+
++ |
   |
++ |
| Logical Router | | ++ +--+
| lr002  +-+-+ Logical Switch +-+ Phyiscal Network |
++ | | ls-ext | |  |
   | ++ +--+
  ...  |
   |
++ |
| Logical Router | |
| lr300  +-+
++

If now a multicast packet comes in to ls-ext from the physical network (e.g. 
for our case a arp request or a neighbor discovery) it is flooded to all 
attached lsp's.
The logical routers then try to lookup the source of the arp request/nd in 
their arp/neighbor table and insert it if it is not found.
For the picture above that means that a single arp packet can trigger 300 
lookup_arp and put_arp actions and each put_arp would result in a controller 
action.
The outcome would be the same for all 300 logical routers: Each of them would 
insert the mac/ip binding in the MAC_Binding table and for each of them flows 
would be added.
This leads to load on the ovn-controllers and the southbound database. Also 
some of the put_arp action can easily be lost due to the queue limit to the 
ovn-controller.

In the mailinglist thread mentioned above ([1]) there was the discussion to 
move this learning process to the Logical Switch.
This would mean that:
1. we only need to handle the learning in one location (and therefor only do it 
once)
2. the MAC_Binding table in the southbound database does not contain the 
(roughly) same entry 300 times

However we where unsure if there is anything speaking against such an approach.
>From my naive understanding I would propose the following changes:

1. move the whole lr_in_lookup_neighbor (table 1) and lr_in_learn_neighbor 
(table 2) from the logical router pipeline to the logical switch pipeline 
between ls_in_hairpin (table 18) and ls_in_arp_rsp (table 19)

2a. Leave the arp resolve stage in the logical routers: teach 
add_neighbor_flows to not only learn from the Mac_Binding table based on 
logical_ports but also based on the datapaths these ports are connected to

2b. Or move the arp resolve stage to the logical switches:
   1) move lr_in_arp_resolve (table 17) and lr_in_arp_request (table 21) from 
the logical router to the logical switch pipeline
   2) clarify what we would then use as a source address from arp requests that 
would then originate from the logical switch pipeline
   3) clarify how we signal to the logical switch that it should actually do 
the arp lookup (as static mac_bindings for individual routers probably still 
need to work).

I would for now tend to just move the learning stage to the logical switches 
while keeping the arp resolution stage on the logical router side.

I guess I have overlooked something important in here as well, so it would be 
great if I could get feedback on your views on this.

Thanks

[1]: https://mail.openvswitch.org/pipermail/ovs-discuss/2023-March/052268.html

--
Felix Huettner

Diese E Mail enthält möglicherweise vertrauliche Inhalte und ist nur für die 
Verwertung durch den vorgesehenen Empfänger bestimmt. Sollten Sie nicht der 
vorgesehene Empfänger sein, setzen Sie den Absender bitte unverzüglich in 
Kenntnis und löschen diese E Mail. Hinweise zum Datenschutz finden Sie 
hier.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev