Re: [ovs-discuss] OVN SB DB from RAFT cluster to Relay DB

2024-03-19 Thread Han Zhou via discuss
On Thu, Mar 7, 2024 at 12:29 PM Sri kor via discuss <
ovs-discuss@openvswitch.org> wrote:
>
> Is there a way to configure, ovn-controller subscribing to only specific
SB DB updates?
> I

Hi Sri,

As mentioned by Felix, ovn-controller by default subscribes to only SB DB
updates that it considers relevant to the hypervisor.
What's your settings of: external_ids:ovn-monitor-all? (ovs-vsctl
--if-exists get open . external_ids:ovn-monitor-all)
Is there anything you want to tune beyond this?

Thanks,
Han

>
> On Wed, Mar 6, 2024 at 1:45 AM Felix Huettner 
wrote:
>>
>> On Wed, Mar 06, 2024 at 10:29:29AM +0300, Vladislav Odintsov wrote:
>> > Hi Felix,
>> >
>> > > On 6 Mar 2024, at 10:16, Felix Huettner via discuss <
ovs-discuss@openvswitch.org> wrote:
>> > >
>> > > Hi Srini,
>> > >
>> > > i can share what works for us for ~1k hypervisors:
>> > >
>> > > On Tue, Mar 05, 2024 at 09:51:43PM -0800, Sri kor via discuss wrote:
>> > >> Hi Team,
>> > >>
>> > >>
>> > >> Currently , we are using OVN in RAFT cluster mode. We have 3 NB and
SB
>> > >> ovsdb-servers operating in RAFT cluster mode. Currently we have 500
>> > >> hypervisors connected to this RAFT cluster.
>> > >>
>> > >> For our next deployment, our scale would increase to 3000
hypervisors. To
>> > >> accommodate this scaled hypervisors, we are migrating to DB relay
with
>> > >> multigroup deployment model. This increase helps with OVN SB DB read
>> > >> transactions. But for write transactions, only the leader in the
RAFT
>> > >> cluster can update the DB. This creates a load on the leader of
RAFT. Is
>> > >> there a way to address the load on the RAFT cluster leader?
>> > >
>> > > We do the following:
>> > > * If you need TLS on the ovsdb path, separate it out to some
>> > >  reverseproxy that can do just L4 TLS Termination (e.g. traefik, or
so)
>> >
>> > Do I understand correctly that with such TLS "offload" you can’t use
RBAC for hypervisors?
>> >
>>
>> yes, that is the unfortunate side effect
>>
>> > > * Have nobody besides northd connect to the SB DB directly, everyone
>> > >  else needs to use a relay
>> > > * Do not run backups on the cluster leader, but on one of the current
>> > >  followers
>> > > * Increase the raft election timeout significantly (we have 120s in
>> > >  there). However there is a patch afaik in 3.3 that makes that better
>> > > * If you create metrics or so from database content generate these on
>> > >  the relays instead of the raft cluster
>> > >
>> > > Overall when our southbound db had issues most of the time it was
some
>> > > client constantly reconnecting to it and thereby pulling always a
full
>> > > DB dump.
>> > >
>> > >>
>> > >>
>> > >> As the scale increases, number updates coming to the ovn-controller
from
>> > >> OVN SB increases. that creates pressure on ovn-controller. Is there
a way
>> > >> to minimize the load on ovn-controller?
>> > >
>> > > Did not see any kind of issue there yet.
>> > > However if you are using some python tooling outside of OVN (e.g.
>> > > Openstack) ensure that you have JSON parsing using a C library
avaialble
>> > > in the ovs lib. This brings significant performance benefts if you
have
>> > > a lot of updates.
>> > > You can check with `python3 -c "import ovs.json;
print(ovs.json.PARSER)"`
>> > > which should return "C".
>> > >
>> > >>
>> > >> I wish there is a way for ovn-controller to subscribe to updates
specific
>> > >> to this hypervisor. Are there any known ovn-contrller subscription
methods
>> > >> available and being used OVS community?
>> > >
>> > > Yes, they do that per default. However for us we saw that this
creates
>> > > increased load on the relays due to the needed additional filtering
and
>> > > json serializing per target node. So we turned it of and thereby
trade
>> > > less ovsdb load for more network bandwidth.
>> > > Relevant setting is `external_ids:ovn-monitor-all`.
>> > >
>> > > Thanks
>> > > Felix
>> > >
>> > >>
>> > >>
>> > >> How can I optimize the load on the leader node in an OVN RAFT
cluster to
>> > >> handle increased write transactions?
>> > >>
>> > >>
>> > >>
>> > >> Thanks,
>> > >>
>> > >> Srini
>> > >
>> > >> ___
>> > >> discuss mailing list
>> > >> disc...@openvswitch.org
>> > >> https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
>> > >
>> > > ___
>> > > discuss mailing list
>> > > disc...@openvswitch.org 
>> > > https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
>> >
>> >
>> > Regards,
>> > Vladislav Odintsov
>> >
>
> ___
> discuss mailing list
> disc...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] OVN SB DB from RAFT cluster to Relay DB

2024-03-07 Thread Sri kor via discuss
Is there a way to configure, ovn-controller subscribing to only specific SB
DB updates?
I

On Wed, Mar 6, 2024 at 1:45 AM Felix Huettner 
wrote:

> On Wed, Mar 06, 2024 at 10:29:29AM +0300, Vladislav Odintsov wrote:
> > Hi Felix,
> >
> > > On 6 Mar 2024, at 10:16, Felix Huettner via discuss <
> ovs-discuss@openvswitch.org> wrote:
> > >
> > > Hi Srini,
> > >
> > > i can share what works for us for ~1k hypervisors:
> > >
> > > On Tue, Mar 05, 2024 at 09:51:43PM -0800, Sri kor via discuss wrote:
> > >> Hi Team,
> > >>
> > >>
> > >> Currently , we are using OVN in RAFT cluster mode. We have 3 NB and SB
> > >> ovsdb-servers operating in RAFT cluster mode. Currently we have 500
> > >> hypervisors connected to this RAFT cluster.
> > >>
> > >> For our next deployment, our scale would increase to 3000
> hypervisors. To
> > >> accommodate this scaled hypervisors, we are migrating to DB relay with
> > >> multigroup deployment model. This increase helps with OVN SB DB read
> > >> transactions. But for write transactions, only the leader in the RAFT
> > >> cluster can update the DB. This creates a load on the leader of RAFT.
> Is
> > >> there a way to address the load on the RAFT cluster leader?
> > >
> > > We do the following:
> > > * If you need TLS on the ovsdb path, separate it out to some
> > >  reverseproxy that can do just L4 TLS Termination (e.g. traefik, or so)
> >
> > Do I understand correctly that with such TLS "offload" you can’t use
> RBAC for hypervisors?
> >
>
> yes, that is the unfortunate side effect
>
> > > * Have nobody besides northd connect to the SB DB directly, everyone
> > >  else needs to use a relay
> > > * Do not run backups on the cluster leader, but on one of the current
> > >  followers
> > > * Increase the raft election timeout significantly (we have 120s in
> > >  there). However there is a patch afaik in 3.3 that makes that better
> > > * If you create metrics or so from database content generate these on
> > >  the relays instead of the raft cluster
> > >
> > > Overall when our southbound db had issues most of the time it was some
> > > client constantly reconnecting to it and thereby pulling always a full
> > > DB dump.
> > >
> > >>
> > >>
> > >> As the scale increases, number updates coming to the ovn-controller
> from
> > >> OVN SB increases. that creates pressure on ovn-controller. Is there a
> way
> > >> to minimize the load on ovn-controller?
> > >
> > > Did not see any kind of issue there yet.
> > > However if you are using some python tooling outside of OVN (e.g.
> > > Openstack) ensure that you have JSON parsing using a C library
> avaialble
> > > in the ovs lib. This brings significant performance benefts if you have
> > > a lot of updates.
> > > You can check with `python3 -c "import ovs.json;
> print(ovs.json.PARSER)"`
> > > which should return "C".
> > >
> > >>
> > >> I wish there is a way for ovn-controller to subscribe to updates
> specific
> > >> to this hypervisor. Are there any known ovn-contrller subscription
> methods
> > >> available and being used OVS community?
> > >
> > > Yes, they do that per default. However for us we saw that this creates
> > > increased load on the relays due to the needed additional filtering and
> > > json serializing per target node. So we turned it of and thereby trade
> > > less ovsdb load for more network bandwidth.
> > > Relevant setting is `external_ids:ovn-monitor-all`.
> > >
> > > Thanks
> > > Felix
> > >
> > >>
> > >>
> > >> How can I optimize the load on the leader node in an OVN RAFT cluster
> to
> > >> handle increased write transactions?
> > >>
> > >>
> > >>
> > >> Thanks,
> > >>
> > >> Srini
> > >
> > >> ___
> > >> discuss mailing list
> > >> disc...@openvswitch.org
> > >> https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
> > >
> > > ___
> > > discuss mailing list
> > > disc...@openvswitch.org 
> > > https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
> >
> >
> > Regards,
> > Vladislav Odintsov
> >
>
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] OVN SB DB from RAFT cluster to Relay DB

2024-03-06 Thread Felix Huettner via discuss
On Wed, Mar 06, 2024 at 10:29:29AM +0300, Vladislav Odintsov wrote:
> Hi Felix,
> 
> > On 6 Mar 2024, at 10:16, Felix Huettner via discuss 
> >  wrote:
> > 
> > Hi Srini,
> > 
> > i can share what works for us for ~1k hypervisors:
> > 
> > On Tue, Mar 05, 2024 at 09:51:43PM -0800, Sri kor via discuss wrote:
> >> Hi Team,
> >> 
> >> 
> >> Currently , we are using OVN in RAFT cluster mode. We have 3 NB and SB
> >> ovsdb-servers operating in RAFT cluster mode. Currently we have 500
> >> hypervisors connected to this RAFT cluster.
> >> 
> >> For our next deployment, our scale would increase to 3000 hypervisors. To
> >> accommodate this scaled hypervisors, we are migrating to DB relay with
> >> multigroup deployment model. This increase helps with OVN SB DB read
> >> transactions. But for write transactions, only the leader in the RAFT
> >> cluster can update the DB. This creates a load on the leader of RAFT. Is
> >> there a way to address the load on the RAFT cluster leader?
> > 
> > We do the following:
> > * If you need TLS on the ovsdb path, separate it out to some
> >  reverseproxy that can do just L4 TLS Termination (e.g. traefik, or so)
> 
> Do I understand correctly that with such TLS "offload" you can’t use RBAC for 
> hypervisors?
> 

yes, that is the unfortunate side effect

> > * Have nobody besides northd connect to the SB DB directly, everyone
> >  else needs to use a relay
> > * Do not run backups on the cluster leader, but on one of the current
> >  followers
> > * Increase the raft election timeout significantly (we have 120s in
> >  there). However there is a patch afaik in 3.3 that makes that better
> > * If you create metrics or so from database content generate these on
> >  the relays instead of the raft cluster
> > 
> > Overall when our southbound db had issues most of the time it was some
> > client constantly reconnecting to it and thereby pulling always a full
> > DB dump.
> > 
> >> 
> >> 
> >> As the scale increases, number updates coming to the ovn-controller from
> >> OVN SB increases. that creates pressure on ovn-controller. Is there a way
> >> to minimize the load on ovn-controller?
> > 
> > Did not see any kind of issue there yet.
> > However if you are using some python tooling outside of OVN (e.g.
> > Openstack) ensure that you have JSON parsing using a C library avaialble
> > in the ovs lib. This brings significant performance benefts if you have
> > a lot of updates.
> > You can check with `python3 -c "import ovs.json; print(ovs.json.PARSER)"`
> > which should return "C".
> > 
> >> 
> >> I wish there is a way for ovn-controller to subscribe to updates specific
> >> to this hypervisor. Are there any known ovn-contrller subscription methods
> >> available and being used OVS community?
> > 
> > Yes, they do that per default. However for us we saw that this creates
> > increased load on the relays due to the needed additional filtering and
> > json serializing per target node. So we turned it of and thereby trade
> > less ovsdb load for more network bandwidth.
> > Relevant setting is `external_ids:ovn-monitor-all`.
> > 
> > Thanks
> > Felix
> > 
> >> 
> >> 
> >> How can I optimize the load on the leader node in an OVN RAFT cluster to
> >> handle increased write transactions?
> >> 
> >> 
> >> 
> >> Thanks,
> >> 
> >> Srini
> > 
> >> ___
> >> discuss mailing list
> >> disc...@openvswitch.org
> >> https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
> > 
> > ___
> > discuss mailing list
> > disc...@openvswitch.org 
> > https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
> 
> 
> Regards,
> Vladislav Odintsov
> 
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] OVN SB DB from RAFT cluster to Relay DB

2024-03-05 Thread Vladislav Odintsov via discuss
Hi Felix,

> On 6 Mar 2024, at 10:16, Felix Huettner via discuss 
>  wrote:
> 
> Hi Srini,
> 
> i can share what works for us for ~1k hypervisors:
> 
> On Tue, Mar 05, 2024 at 09:51:43PM -0800, Sri kor via discuss wrote:
>> Hi Team,
>> 
>> 
>> Currently , we are using OVN in RAFT cluster mode. We have 3 NB and SB
>> ovsdb-servers operating in RAFT cluster mode. Currently we have 500
>> hypervisors connected to this RAFT cluster.
>> 
>> For our next deployment, our scale would increase to 3000 hypervisors. To
>> accommodate this scaled hypervisors, we are migrating to DB relay with
>> multigroup deployment model. This increase helps with OVN SB DB read
>> transactions. But for write transactions, only the leader in the RAFT
>> cluster can update the DB. This creates a load on the leader of RAFT. Is
>> there a way to address the load on the RAFT cluster leader?
> 
> We do the following:
> * If you need TLS on the ovsdb path, separate it out to some
>  reverseproxy that can do just L4 TLS Termination (e.g. traefik, or so)

Do I understand correctly that with such TLS "offload" you can’t use RBAC for 
hypervisors?

> * Have nobody besides northd connect to the SB DB directly, everyone
>  else needs to use a relay
> * Do not run backups on the cluster leader, but on one of the current
>  followers
> * Increase the raft election timeout significantly (we have 120s in
>  there). However there is a patch afaik in 3.3 that makes that better
> * If you create metrics or so from database content generate these on
>  the relays instead of the raft cluster
> 
> Overall when our southbound db had issues most of the time it was some
> client constantly reconnecting to it and thereby pulling always a full
> DB dump.
> 
>> 
>> 
>> As the scale increases, number updates coming to the ovn-controller from
>> OVN SB increases. that creates pressure on ovn-controller. Is there a way
>> to minimize the load on ovn-controller?
> 
> Did not see any kind of issue there yet.
> However if you are using some python tooling outside of OVN (e.g.
> Openstack) ensure that you have JSON parsing using a C library avaialble
> in the ovs lib. This brings significant performance benefts if you have
> a lot of updates.
> You can check with `python3 -c "import ovs.json; print(ovs.json.PARSER)"`
> which should return "C".
> 
>> 
>> I wish there is a way for ovn-controller to subscribe to updates specific
>> to this hypervisor. Are there any known ovn-contrller subscription methods
>> available and being used OVS community?
> 
> Yes, they do that per default. However for us we saw that this creates
> increased load on the relays due to the needed additional filtering and
> json serializing per target node. So we turned it of and thereby trade
> less ovsdb load for more network bandwidth.
> Relevant setting is `external_ids:ovn-monitor-all`.
> 
> Thanks
> Felix
> 
>> 
>> 
>> How can I optimize the load on the leader node in an OVN RAFT cluster to
>> handle increased write transactions?
>> 
>> 
>> 
>> Thanks,
>> 
>> Srini
> 
>> ___
>> discuss mailing list
>> disc...@openvswitch.org
>> https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
> 
> ___
> discuss mailing list
> disc...@openvswitch.org 
> https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Regards,
Vladislav Odintsov

___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] OVN SB DB from RAFT cluster to Relay DB

2024-03-05 Thread Felix Huettner via discuss
Hi Srini,

i can share what works for us for ~1k hypervisors:

On Tue, Mar 05, 2024 at 09:51:43PM -0800, Sri kor via discuss wrote:
> Hi Team,
> 
> 
> Currently , we are using OVN in RAFT cluster mode. We have 3 NB and SB
> ovsdb-servers operating in RAFT cluster mode. Currently we have 500
> hypervisors connected to this RAFT cluster.
> 
> For our next deployment, our scale would increase to 3000 hypervisors. To
> accommodate this scaled hypervisors, we are migrating to DB relay with
> multigroup deployment model. This increase helps with OVN SB DB read
> transactions. But for write transactions, only the leader in the RAFT
> cluster can update the DB. This creates a load on the leader of RAFT. Is
> there a way to address the load on the RAFT cluster leader?

We do the following:
* If you need TLS on the ovsdb path, separate it out to some
  reverseproxy that can do just L4 TLS Termination (e.g. traefik, or so)
* Have nobody besides northd connect to the SB DB directly, everyone
  else needs to use a relay
* Do not run backups on the cluster leader, but on one of the current
  followers
* Increase the raft election timeout significantly (we have 120s in
  there). However there is a patch afaik in 3.3 that makes that better
* If you create metrics or so from database content generate these on
  the relays instead of the raft cluster

Overall when our southbound db had issues most of the time it was some
client constantly reconnecting to it and thereby pulling always a full
DB dump.

> 
> 
> As the scale increases, number updates coming to the ovn-controller from
> OVN SB increases. that creates pressure on ovn-controller. Is there a way
> to minimize the load on ovn-controller?

Did not see any kind of issue there yet.
However if you are using some python tooling outside of OVN (e.g.
Openstack) ensure that you have JSON parsing using a C library avaialble
in the ovs lib. This brings significant performance benefts if you have
a lot of updates.
You can check with `python3 -c "import ovs.json; print(ovs.json.PARSER)"`
which should return "C".

> 
> I wish there is a way for ovn-controller to subscribe to updates specific
> to this hypervisor. Are there any known ovn-contrller subscription methods
> available and being used OVS community?

Yes, they do that per default. However for us we saw that this creates
increased load on the relays due to the needed additional filtering and
json serializing per target node. So we turned it of and thereby trade
less ovsdb load for more network bandwidth.
Relevant setting is `external_ids:ovn-monitor-all`.

Thanks
Felix

> 
> 
> How can I optimize the load on the leader node in an OVN RAFT cluster to
> handle increased write transactions?
> 
> 
> 
> Thanks,
> 
> Srini

> ___
> discuss mailing list
> disc...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-discuss

___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


[ovs-discuss] OVN SB DB from RAFT cluster to Relay DB

2024-03-05 Thread Sri kor via discuss
Hi Team,


Currently , we are using OVN in RAFT cluster mode. We have 3 NB and SB
ovsdb-servers operating in RAFT cluster mode. Currently we have 500
hypervisors connected to this RAFT cluster.

For our next deployment, our scale would increase to 3000 hypervisors. To
accommodate this scaled hypervisors, we are migrating to DB relay with
multigroup deployment model. This increase helps with OVN SB DB read
transactions. But for write transactions, only the leader in the RAFT
cluster can update the DB. This creates a load on the leader of RAFT. Is
there a way to address the load on the RAFT cluster leader?


As the scale increases, number updates coming to the ovn-controller from
OVN SB increases. that creates pressure on ovn-controller. Is there a way
to minimize the load on ovn-controller?

I wish there is a way for ovn-controller to subscribe to updates specific
to this hypervisor. Are there any known ovn-contrller subscription methods
available and being used OVS community?


How can I optimize the load on the leader node in an OVN RAFT cluster to
handle increased write transactions?



Thanks,

Srini
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss