Re: [ovs-discuss] northd: amount of ref_chassis in ha_chassis_group

2023-08-29 Thread Felix Huettner via discuss
Hi everyone,

thank you all for that work. That looks incredibly helpfull.
On Thu, Aug 24, 2023 at 03:50:22PM -0300, Roberto Bartzen Acosta via discuss 
wrote:
> Hi Max and Stack IT folks,
>
> Em qui., 24 de ago. de 2023 às 14:31, Ilya Maximets via discuss <
> ovs-discuss@openvswitch.org> escreveu:
>
> > On 5/31/23 16:40, Max André Lamprecht via discuss wrote:
> > > Hi,
> > >
> > > We noticed in a large openstack cluster (~500 Chassis) that during a
> > VIP(attached with a floating ip) failover it takes up to 50 seconds until
> > the traffic gets routed to the correct port.
> > > That is caused due to the slow Logical_Flow update in the southbound db.
> > Before that update happens all the FIP traffic will be still forwarded to
> > the previous vip port.
> > >
> > > inc_proc_eng|INFO|node: northd, recompute (forced) took 20502ms
> > > inc_proc_eng|INFO|node: lflow, recompute (forced) took 802ms
> > >
> > > This lflow gets updated by northd. During debugging we discovered that
> > northd spends a large amount of time at recomputing ref_chassis in the
> > HA_Chassis_Group table.
> > >
> > > ovnsb_db_run (inlined)
> > > -handle_port_binding_changes (inlined)
> > > - 10.53% build_ha_chassis_group_ref_chassis (inlined)
> > > 8.84% add_to_ha_ref_chassis_info (inlined)
> > > 0.69% hmap_next (inlined)
> > >
> > > Maybe this is special to our environment because we have a few external
> > stretched l2 networks represented as Logical_Switch.
> > > To these Logical_Switches there are many Logical_Router_Ports attached.
> > e.g. ~3500 LRPs are attached to one LS.
> > >
> > > compute vm -> internal net -> router -> external net(type=localnet) > N
> > Routers
> > > C1 -> LS1 -> R1 -> LS2 -> R2..RN -> LS2..LSN -> C2..CN
> > >
> > > Currently we can see that northd adds about ~500 Chassis to each
> > ref_chassis column. I think that this is too much and not nessasary. Pls
> > correct me if I´m wrong :)
> > >
> > > If I see this right ref_chassis is only taken to decide where to build
> > the BFD sessions to.
> > > Is there a reason why this needs to be referenced across chassisredirect
> > ports and further?
> > > Does it make sense that we stop the whole lookup process in
> > build_lrouter_groups__() if we have a lrp with a chassis-redirect-port set?
> >
> > Hi, Max.
> >
> > I might have solved some of these issues while investigating a different
> > issue report:
> >
> > https://mail.openvswitch.org/pipermail/ovs-discuss/2023-August/052614.html
> >
> > With these two patches:
> >
> > https://patchwork.ozlabs.org/project/ovn/patch/20230823214140.1779255-1-i.maxim...@ovn.org/
> >
> > https://patchwork.ozlabs.org/project/ovn/patch/20230823215705.1786348-1-i.maxim...@ovn.org/
>
>
> These patches showed an impressive gain! reduces the northd recompute time
> to "practically zero", and reduces the CPU usage of the
> NorthBound/SouthBound ovsdb processes -> with this new behavior, the SB
> port events processing is practically instantaneous.
>
>
>
> >
> >
> > Han pointed me to this thread, as it seems like the issue you're facing
> > is practically the same.
> >
> > I agree though that there is a potential for even further improvement as at
> > least there should be a way to not duplicate all the chassis in each group.
> > But I hope that the patches above are enough for now.
> >
> > Best regards, Ilya Maximets.
> > ___
> > discuss mailing list
> > disc...@openvswitch.org
> > https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
> >
>
> --
>
>
>
>
> _‘Esta mensagem é direcionada apenas para os endereços constantes no
> cabeçalho inicial. Se você não está listado nos endereços constantes no
> cabeçalho, pedimos-lhe que desconsidere completamente o conteúdo dessa
> mensagem e cuja cópia, encaminhamento e/ou execução das ações citadas estão
> imediatamente anuladas e proibidas’._
>
>
> * **‘Apesar do Magazine Luiza tomar
> todas as precauções razoáveis para assegurar que nenhum vírus esteja
> presente nesse e-mail, a empresa não poderá aceitar a responsabilidade por
> quaisquer perdas ou danos causados por esse e-mail ou por seus anexos’.*
>
>
>

> ___
> discuss mailing list
> disc...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-discuss

Diese E Mail enthält möglicherweise vertrauliche Inhalte und ist nur für die 
Verwertung durch den vorgesehenen Empfänger bestimmt.
Sollten Sie nicht der vorgesehene Empfänger sein, setzen Sie den Absender bitte 
unverzüglich in Kenntnis und löschen diese E Mail.

Hinweise zum Datenschutz finden Sie hier.


This e-mail may contain confidential content and is intended only for the 
specified recipient/s.
If you are not the intended recipient, please inform the sender immediately and 
delete this e-mail.

Information on data protection can be found 
here.
__

Re: [ovs-discuss] northd: amount of ref_chassis in ha_chassis_group

2023-08-24 Thread Roberto Bartzen Acosta via discuss
Hi Max and Stack IT folks,

Em qui., 24 de ago. de 2023 às 14:31, Ilya Maximets via discuss <
ovs-discuss@openvswitch.org> escreveu:

> On 5/31/23 16:40, Max André Lamprecht via discuss wrote:
> > Hi,
> >
> > We noticed in a large openstack cluster (~500 Chassis) that during a
> VIP(attached with a floating ip) failover it takes up to 50 seconds until
> the traffic gets routed to the correct port.
> > That is caused due to the slow Logical_Flow update in the southbound db.
> Before that update happens all the FIP traffic will be still forwarded to
> the previous vip port.
> >
> > inc_proc_eng|INFO|node: northd, recompute (forced) took 20502ms
> > inc_proc_eng|INFO|node: lflow, recompute (forced) took 802ms
> >
> > This lflow gets updated by northd. During debugging we discovered that
> northd spends a large amount of time at recomputing ref_chassis in the
> HA_Chassis_Group table.
> >
> > ovnsb_db_run (inlined)
> > -handle_port_binding_changes (inlined)
> > - 10.53% build_ha_chassis_group_ref_chassis (inlined)
> > 8.84% add_to_ha_ref_chassis_info (inlined)
> > 0.69% hmap_next (inlined)
> >
> > Maybe this is special to our environment because we have a few external
> stretched l2 networks represented as Logical_Switch.
> > To these Logical_Switches there are many Logical_Router_Ports attached.
> e.g. ~3500 LRPs are attached to one LS.
> >
> > compute vm -> internal net -> router -> external net(type=localnet) > N
> Routers
> > C1 -> LS1 -> R1 -> LS2 -> R2..RN -> LS2..LSN -> C2..CN
> >
> > Currently we can see that northd adds about ~500 Chassis to each
> ref_chassis column. I think that this is too much and not nessasary. Pls
> correct me if I´m wrong :)
> >
> > If I see this right ref_chassis is only taken to decide where to build
> the BFD sessions to.
> > Is there a reason why this needs to be referenced across chassisredirect
> ports and further?
> > Does it make sense that we stop the whole lookup process in
> build_lrouter_groups__() if we have a lrp with a chassis-redirect-port set?
>
> Hi, Max.
>
> I might have solved some of these issues while investigating a different
> issue report:
>
> https://mail.openvswitch.org/pipermail/ovs-discuss/2023-August/052614.html
>
> With these two patches:
>
> https://patchwork.ozlabs.org/project/ovn/patch/20230823214140.1779255-1-i.maxim...@ovn.org/
>
> https://patchwork.ozlabs.org/project/ovn/patch/20230823215705.1786348-1-i.maxim...@ovn.org/


These patches showed an impressive gain! reduces the northd recompute time
to "practically zero", and reduces the CPU usage of the
NorthBound/SouthBound ovsdb processes -> with this new behavior, the SB
port events processing is practically instantaneous.



>
>
> Han pointed me to this thread, as it seems like the issue you're facing
> is practically the same.
>
> I agree though that there is a potential for even further improvement as at
> least there should be a way to not duplicate all the chassis in each group.
> But I hope that the patches above are enough for now.
>
> Best regards, Ilya Maximets.
> ___
> discuss mailing list
> disc...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
>

-- 




_‘Esta mensagem é direcionada apenas para os endereços constantes no 
cabeçalho inicial. Se você não está listado nos endereços constantes no 
cabeçalho, pedimos-lhe que desconsidere completamente o conteúdo dessa 
mensagem e cuja cópia, encaminhamento e/ou execução das ações citadas estão 
imediatamente anuladas e proibidas’._


* **‘Apesar do Magazine Luiza tomar 
todas as precauções razoáveis para assegurar que nenhum vírus esteja 
presente nesse e-mail, a empresa não poderá aceitar a responsabilidade por 
quaisquer perdas ou danos causados por esse e-mail ou por seus anexos’.*



___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] northd: amount of ref_chassis in ha_chassis_group

2023-08-24 Thread Ilya Maximets via discuss
On 5/31/23 16:40, Max André Lamprecht via discuss wrote:
> Hi,
> 
> We noticed in a large openstack cluster (~500 Chassis) that during a 
> VIP(attached with a floating ip) failover it takes up to 50 seconds until the 
> traffic gets routed to the correct port.
> That is caused due to the slow Logical_Flow update in the southbound db. 
> Before that update happens all the FIP traffic will be still forwarded to the 
> previous vip port.
> 
> inc_proc_eng|INFO|node: northd, recompute (forced) took 20502ms
> inc_proc_eng|INFO|node: lflow, recompute (forced) took 802ms
> 
> This lflow gets updated by northd. During debugging we discovered that northd 
> spends a large amount of time at recomputing ref_chassis in the 
> HA_Chassis_Group table.
> 
> ovnsb_db_run (inlined)
> -handle_port_binding_changes (inlined)
> - 10.53% build_ha_chassis_group_ref_chassis (inlined)
> 8.84% add_to_ha_ref_chassis_info (inlined)
> 0.69% hmap_next (inlined)
> 
> Maybe this is special to our environment because we have a few external 
> stretched l2 networks represented as Logical_Switch.
> To these Logical_Switches there are many Logical_Router_Ports attached. e.g. 
> ~3500 LRPs are attached to one LS.
> 
> compute vm -> internal net -> router -> external net(type=localnet) > N 
> Routers
> C1 -> LS1 -> R1 -> LS2 -> R2..RN -> LS2..LSN -> C2..CN
> 
> Currently we can see that northd adds about ~500 Chassis to each ref_chassis 
> column. I think that this is too much and not nessasary. Pls correct me if 
> I´m wrong :)
> 
> If I see this right ref_chassis is only taken to decide where to build the 
> BFD sessions to.
> Is there a reason why this needs to be referenced across chassisredirect 
> ports and further?
> Does it make sense that we stop the whole lookup process in 
> build_lrouter_groups__() if we have a lrp with a chassis-redirect-port set?

Hi, Max.

I might have solved some of these issues while investigating a different
issue report:
  https://mail.openvswitch.org/pipermail/ovs-discuss/2023-August/052614.html

With these two patches:
  
https://patchwork.ozlabs.org/project/ovn/patch/20230823214140.1779255-1-i.maxim...@ovn.org/
  
https://patchwork.ozlabs.org/project/ovn/patch/20230823215705.1786348-1-i.maxim...@ovn.org/

Han pointed me to this thread, as it seems like the issue you're facing
is practically the same.

I agree though that there is a potential for even further improvement as at
least there should be a way to not duplicate all the chassis in each group.
But I hope that the patches above are enough for now.

Best regards, Ilya Maximets.
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] northd: amount of ref_chassis in ha_chassis_group

2023-06-05 Thread Ihtisham ul Haq via discuss
Hi OVN,

May be another question on top. :)

>From [1] In the HA_Chassis_Group Table:

   ref_chassis: set of weak reference to Chassis
  The set of Chassis that reference  this  HA  chassis  group.  To
  determine  the  correct  Chassis,  find the chassisredirect type
  Port_Binding  that  references   this   HA_Chassis_Group.   This
  Port_Binding  is  derived  from  some particular logical router.
  Starting from that LR, find the set of all logical switches  and
  routers  connected  to it, directly or indirectly, across router
  ports that link one LRP to another or to a LSP. For each LSP  in
  these  logical switches, find the corresponding Port_Binding and
  add its bound Chassis (if any) to ref_chassis.

What is meant by "indirectly" in the above text? And why do we need it to keep 
track of indirect connections?

Because for us that results in a long(possibly full) list of compute chassis in 
the `ref_chassis` for each LRP, which hurts us during recompute as Max 
mentioned below. :)

Thanks!

[1] https://www.ovn.org/support/dist-docs/ovn-sb.5.html


Kind regards,
Ihtisham ul Haq

-Original Message-
From: discuss  On Behalf Of Max André 
Lamprecht via discuss
Sent: Wednesday, May 31, 2023 4:40 PM
To: ovs-discuss@openvswitch.org
Subject: [ovs-discuss] northd: amount of ref_chassis in ha_chassis_group

Hi,

We noticed in a large openstack cluster (~500 Chassis) that during a 
VIP(attached with a floating ip) failover it takes up to 50 seconds until the 
traffic gets routed to the correct port.
That is caused due to the slow Logical_Flow update in the southbound db. Before 
that update happens all the FIP traffic will be still forwarded to the previous 
vip port.

inc_proc_eng|INFO|node: northd, recompute (forced) took 20502ms
inc_proc_eng|INFO|node: lflow, recompute (forced) took 802ms

This lflow gets updated by northd. During debugging we discovered that northd 
spends a large amount of time at recomputing ref_chassis in the 
HA_Chassis_Group table.

ovnsb_db_run (inlined)
-handle_port_binding_changes (inlined)
- 10.53% build_ha_chassis_group_ref_chassis (inlined)
8.84% add_to_ha_ref_chassis_info (inlined)
0.69% hmap_next (inlined)

Maybe this is special to our environment because we have a few external 
stretched l2 networks represented as Logical_Switch.
To these Logical_Switches there are many Logical_Router_Ports attached. e.g. 
~3500 LRPs are attached to one LS.

compute vm -> internal net -> router -> external net(type=localnet) > N Routers
C1 -> LS1 -> R1 -> LS2 -> R2..RN -> LS2..LSN -> C2..CN

Currently we can see that northd adds about ~500 Chassis to each ref_chassis 
column. I think that this is too much and not nessasary. Pls correct me if I´m 
wrong :)

If I see this right ref_chassis is only taken to decide where to build the BFD 
sessions to.
Is there a reason why this needs to be referenced across chassisredirect ports 
and further?
Does it make sense that we stop the whole lookup process in 
build_lrouter_groups__() if we have a lrp with a chassis-redirect-port set?


Thanks for your time
Max
Diese E Mail enthält möglicherweise vertrauliche Inhalte und ist nur für die 
Verwertung durch den vorgesehenen Empfänger bestimmt.
Sollten Sie nicht der vorgesehene Empfänger sein, setzen Sie den Absender bitte 
unverzüglich in Kenntnis und löschen diese E Mail.

Hinweise zum Datenschutz finden Sie hier<https://www.datenschutz.schwarz/>.


This e-mail may contain confidential content and is intended only for the 
specified recipient/s.
If you are not the intended recipient, please inform the sender immediately and 
delete this e-mail.

Information on data protection can be found 
here<https://www.datenschutz.schwarz/>.
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
Diese E Mail enthält möglicherweise vertrauliche Inhalte und ist nur für die 
Verwertung durch den vorgesehenen Empfänger bestimmt.
Sollten Sie nicht der vorgesehene Empfänger sein, setzen Sie den Absender bitte 
unverzüglich in Kenntnis und löschen diese E Mail.

Hinweise zum Datenschutz finden Sie hier<https://www.datenschutz.schwarz/>.


This e-mail may contain confidential content and is intended only for the 
specified recipient/s.
If you are not the intended recipient, please inform the sender immediately and 
delete this e-mail.

Information on data protection can be found 
here<https://www.datenschutz.schwarz/>.
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


[ovs-discuss] northd: amount of ref_chassis in ha_chassis_group

2023-05-31 Thread Max André Lamprecht via discuss
Hi,

We noticed in a large openstack cluster (~500 Chassis) that during a 
VIP(attached with a floating ip) failover it takes up to 50 seconds until the 
traffic gets routed to the correct port.
That is caused due to the slow Logical_Flow update in the southbound db. Before 
that update happens all the FIP traffic will be still forwarded to the previous 
vip port.

inc_proc_eng|INFO|node: northd, recompute (forced) took 20502ms
inc_proc_eng|INFO|node: lflow, recompute (forced) took 802ms

This lflow gets updated by northd. During debugging we discovered that northd 
spends a large amount of time at recomputing ref_chassis in the 
HA_Chassis_Group table.

ovnsb_db_run (inlined)
-handle_port_binding_changes (inlined)
- 10.53% build_ha_chassis_group_ref_chassis (inlined)
8.84% add_to_ha_ref_chassis_info (inlined)
0.69% hmap_next (inlined)

Maybe this is special to our environment because we have a few external 
stretched l2 networks represented as Logical_Switch.
To these Logical_Switches there are many Logical_Router_Ports attached. e.g. 
~3500 LRPs are attached to one LS.

compute vm -> internal net -> router -> external net(type=localnet) > N Routers
C1 -> LS1 -> R1 -> LS2 -> R2..RN -> LS2..LSN -> C2..CN

Currently we can see that northd adds about ~500 Chassis to each ref_chassis 
column. I think that this is too much and not nessasary. Pls correct me if I´m 
wrong :)

If I see this right ref_chassis is only taken to decide where to build the BFD 
sessions to.
Is there a reason why this needs to be referenced across chassisredirect ports 
and further?
Does it make sense that we stop the whole lookup process in 
build_lrouter_groups__() if we have a lrp with a chassis-redirect-port set?


Thanks for your time
Max
Diese E Mail enthält möglicherweise vertrauliche Inhalte und ist nur für die 
Verwertung durch den vorgesehenen Empfänger bestimmt.
Sollten Sie nicht der vorgesehene Empfänger sein, setzen Sie den Absender bitte 
unverzüglich in Kenntnis und löschen diese E Mail.

Hinweise zum Datenschutz finden Sie hier.


This e-mail may contain confidential content and is intended only for the 
specified recipient/s.
If you are not the intended recipient, please inform the sender immediately and 
delete this e-mail.

Information on data protection can be found 
here.
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss