Re: [ovs-discuss] Scaling OVN/Southbound

2023-07-07 Thread Ilya Maximets via discuss
On 7/5/23 18:00, Felix Huettner wrote:
> Hi Han,
> 
> On Fri, Jun 30, 2023 at 05:08:36PM -0700, Han Zhou wrote:
>> On Wed, May 24, 2023 at 12:26 AM Felix Huettner via discuss <
>> ovs-discuss@openvswitch.org> wrote:
>>>
>>> Hi Ilya,
>>>
>>> thank you for the detailed reply
>>>
>>> On Tue, May 23, 2023 at 05:25:49PM +0200, Ilya Maximets wrote:
 On 5/23/23 15:59, Felix Hüttner via discuss wrote:
> Hi everyone,

 Hi, Felix.

>
> we are currently running an OVN Deployment with 450 Nodes. We run a 3
>> node cluster for the northbound database and a 3 nodes cluster for the
>> southbound database.
> Between the southbound cluster and the ovn-controllers we have a
>> layer of 24 ovsdb relays.
> The setup is using TLS for all connections, however the TLS Server is
>> handled by a traefik reverseproxy to offload this from the ovsdb

 The very important part of the system description is what versions
 of OVS and OVN are you using in this setup?  If it's not latest
 3.1 and 23.03, then it's hard to talk about what/if performance
 improvements are actually needed.

>>>
>>> We are currently running ovs 3.1 and ovn 22.12 (in the process of
>>> upgrading to 23.03). `monitor-all` is currently disabled, but we want to
>>> try that as well.
>>>
>> Hi Felix, did you try upgrading and enabling "monitor-all"? How does it
>> look now?
> 
> we did not yet upgrade, but we tried monitor-all and that provided a big
> benefit in terms of stability.
> 
>>
> Northd and Neutron is connecting directly to north- and southbound
>> databases without the relays.

 One of the big things that is annoying is that Neutron connects to
 Southbound database at all.  There are some reasons to do that,
 but ideally that should be avoided.  I know that in the past limiting
 the number of metadata agents was one of the mitigation strategies
 for scaling issues.  Also, why can't it connect to relays?  There
 shouldn't be too many transactions flowing towards Southbound DB
 from the Neutron.

>>>
>>> Thanks for that suggestion, that definately makes sense.
>>>
>> Does this make a big difference? How many Neutron - SB connections are
>> there?
>> What rings a bell is that Neutron is using the python OVSDB library which
>> hasn't implemented the fast-resync feature (if I remember correctly).
>> At the same time, there is the feature leader-transfer-for-snapshot, which
>> automatically transfer leader whenever a snapshot is to be written, which
>> would happen frequently if your environment is very active.
>> When a leader transfer happens, if Neutron set the option "leader-only"
>> (only connects to leader) to SB DB (could someone confirm?), then when the
>> leader transfer happens, all Neutron workers would reconnect to the new
>> leader. With fast-resync, like what's implemented in C IDL and Go, the
>> client that has cached the data would only request the delta when
>> reconnecting. But since the python lib doesn't have this, the Neutron
>> server would re-download full data when reconnecting ...
>> This is a speculation based on the information I have, and the assumptions
>> need to be confirmed.
> 
> We are currently working with upstream neutron to get the leader-only
> flag removed wherever we can. I guess in total the amount of connections
> depends on the process count which would be ~150 connections in total in
> our case.
> 
>>
>
> We needed to increase various timeouts on the ovsdb-server and client
>> side to get this to a mostly stable state:
> * inactivity probes of 60 seconds (for all connections between
>> ovsdb-server, relay and clients)
> * cluster election time of 50 seconds
>
> As long as none of the relays restarts the environment is quite
>> stable.
> However we see quite regularly the "Unreasonably long xxx ms poll
>> interval" messages ranging from 1000ms up to 4ms.

 With latest versions of OVS/OVN the CPU usage on Southbound DB
 servers without relays in our weekly 500-node ovn-heater runs
 stays below 10% during the test phase.  No large poll intervals
 are getting registered.

 Do you have more details on under which circumstances these
 large poll intervals occur?

>>>
>>> It seems to mostly happen on the initial connection of some client to
>>> the ovsdb. From the few times we ran perf there it looks like the time
>>> is spend in creating a monitor and during that sending out the updates
>>> to the client side.
>>>
>> It is one of the worst case scenario for OVSDB when many clients initialize
>> connections to it at the same time, when the size of data downloaded by
>> each client is big.
>> OVSDB relay, for what I understand, should greatly help on this. You have
>> 24 relay nodes, which are supposed to share the burden. Are the SB DB and
>> the relay instances running with sufficient CPU resources?
>> Is it clear that initial connections from which clients 

Re: [ovs-discuss] Scaling OVN/Southbound

2023-07-07 Thread Felix Huettner via discuss
Hi Han,

On Fri, Jul 07, 2023 at 02:04:24PM +0800, Han Zhou via discuss wrote:
> On Thu, Jul 6, 2023 at 12:00 AM Felix Huettner 
> wrote:
> >
> > Hi Han,
> >
> > On Fri, Jun 30, 2023 at 05:08:36PM -0700, Han Zhou wrote:
> > > On Wed, May 24, 2023 at 12:26 AM Felix Huettner via discuss <
> > > ovs-discuss@openvswitch.org> wrote:
> > > >
> > > > Hi Ilya,
> > > >
> > > > thank you for the detailed reply
> > > >
> > > > On Tue, May 23, 2023 at 05:25:49PM +0200, Ilya Maximets wrote:
> > > > > On 5/23/23 15:59, Felix Hüttner via discuss wrote:
> > > > > > Hi everyone,
> > > > >
> > > > > Hi, Felix.
> > > > >
> > > > > >
> > > > > > we are currently running an OVN Deployment with 450 Nodes. We run
> a 3
> > > node cluster for the northbound database and a 3 nodes cluster for the
> > > southbound database.
> > > > > > Between the southbound cluster and the ovn-controllers we have a
> > > layer of 24 ovsdb relays.
> > > > > > The setup is using TLS for all connections, however the TLS
> Server is
> > > handled by a traefik reverseproxy to offload this from the ovsdb
> > > > >
> > > > > The very important part of the system description is what versions
> > > > > of OVS and OVN are you using in this setup?  If it's not latest
> > > > > 3.1 and 23.03, then it's hard to talk about what/if performance
> > > > > improvements are actually needed.
> > > > >
> > > >
> > > > We are currently running ovs 3.1 and ovn 22.12 (in the process of
> > > > upgrading to 23.03). `monitor-all` is currently disabled, but we want
> to
> > > > try that as well.
> > > >
> > > Hi Felix, did you try upgrading and enabling "monitor-all"? How does it
> > > look now?
> >
> > we did not yet upgrade, but we tried monitor-all and that provided a big
> > benefit in terms of stability.
> >
> It is great to know that monitor-all helped for your use case.
>
> > >
> > > > > > Northd and Neutron is connecting directly to north- and southbound
> > > databases without the relays.
> > > > >
> > > > > One of the big things that is annoying is that Neutron connects to
> > > > > Southbound database at all.  There are some reasons to do that,
> > > > > but ideally that should be avoided.  I know that in the past
> limiting
> > > > > the number of metadata agents was one of the mitigation strategies
> > > > > for scaling issues.  Also, why can't it connect to relays?  There
> > > > > shouldn't be too many transactions flowing towards Southbound DB
> > > > > from the Neutron.
> > > > >
> > > >
> > > > Thanks for that suggestion, that definately makes sense.
> > > >
> > > Does this make a big difference? How many Neutron - SB connections are
> > > there?
> > > What rings a bell is that Neutron is using the python OVSDB library
> which
> > > hasn't implemented the fast-resync feature (if I remember correctly).
> > > At the same time, there is the feature leader-transfer-for-snapshot,
> which
> > > automatically transfer leader whenever a snapshot is to be written,
> which
> > > would happen frequently if your environment is very active.
> > > When a leader transfer happens, if Neutron set the option "leader-only"
> > > (only connects to leader) to SB DB (could someone confirm?), then when
> the
> > > leader transfer happens, all Neutron workers would reconnect to the new
> > > leader. With fast-resync, like what's implemented in C IDL and Go, the
> > > client that has cached the data would only request the delta when
> > > reconnecting. But since the python lib doesn't have this, the Neutron
> > > server would re-download full data when reconnecting ...
> > > This is a speculation based on the information I have, and the
> assumptions
> > > need to be confirmed.
> >
> > We are currently working with upstream neutron to get the leader-only
> > flag removed wherever we can. I guess in total the amount of connections
> > depends on the process count which would be ~150 connections in total in
> > our case.
> >
> As Terry pointed out that the python OVSDB lib does support fast-resync,
> then this shouldn't be the problem and I think it is better to keep the
> leader-only flag for neutron because it is more efficient to update
> directly through the leader especially when the client writes heavily.
> Without leader-only, the updates from different followers will have to
> anyway go through the leader but parallel updates will result in sequence
> conflict and will have to retry, which creates more waste and load to the
> servers. But of course, it does harm to try. I didn't think that you have
> so many (~150) connections just from Neutron (I thought it might be 10+),
> which seems big enough to create significant load to the server, especially
> when many of them restarts at the same time such as during an upgrade.

I think we need to keep the leader-only for the northbound connection
anyway, as it relies on some kind of locking there.
But of the southbound connection we can probably get rid of that (and
there we do not really have write load).

>
> > >
> > > > > >
> 

Re: [ovs-discuss] Scaling OVN/Southbound

2023-07-07 Thread Han Zhou via discuss
On Fri, Jul 7, 2023 at 1:21 PM Han Zhou  wrote:
>
>
>
> On Thu, Jul 6, 2023 at 1:28 AM Terry Wilson  wrote:
> >
> > On Wed, Jul 5, 2023 at 9:59 AM Terry Wilson  wrote:
> > >
> > > On Fri, Jun 30, 2023 at 7:09 PM Han Zhou via discuss
> > >  wrote:
> > > >
> > > >
> > > >
> > > > On Wed, May 24, 2023 at 12:26 AM Felix Huettner via discuss <
ovs-discuss@openvswitch.org> wrote:
> > > > >
> > > > > Hi Ilya,
> > > > >
> > > > > thank you for the detailed reply
> > > > >
> > > > > On Tue, May 23, 2023 at 05:25:49PM +0200, Ilya Maximets wrote:
> > > > > > On 5/23/23 15:59, Felix Hüttner via discuss wrote:
> > > > > > > Hi everyone,
> > > > > >
> > > > > > Hi, Felix.
> > > > > >
> > > > > > >
> > > > > > > we are currently running an OVN Deployment with 450 Nodes. We
run a 3 node cluster for the northbound database and a 3 nodes cluster for
the southbound database.
> > > > > > > Between the southbound cluster and the ovn-controllers we
have a layer of 24 ovsdb relays.
> > > > > > > The setup is using TLS for all connections, however the TLS
Server is handled by a traefik reverseproxy to offload this from the ovsdb
> > > > > >
> > > > > > The very important part of the system description is what
versions
> > > > > > of OVS and OVN are you using in this setup?  If it's not latest
> > > > > > 3.1 and 23.03, then it's hard to talk about what/if performance
> > > > > > improvements are actually needed.
> > > > > >
> > > > >
> > > > > We are currently running ovs 3.1 and ovn 22.12 (in the process of
> > > > > upgrading to 23.03). `monitor-all` is currently disabled, but we
want to
> > > > > try that as well.
> > > > >
> > > > Hi Felix, did you try upgrading and enabling "monitor-all"? How
does it look now?
> > > >
> > > > > > > Northd and Neutron is connecting directly to north- and
southbound databases without the relays.
> > > > > >
> > > > > > One of the big things that is annoying is that Neutron connects
to
> > > > > > Southbound database at all.  There are some reasons to do that,
> > > > > > but ideally that should be avoided.  I know that in the past
limiting
> > > > > > the number of metadata agents was one of the mitigation
strategies
> > > > > > for scaling issues.  Also, why can't it connect to relays?
There
> > > > > > shouldn't be too many transactions flowing towards Southbound DB
> > > > > > from the Neutron.
> > > > > >
> > > > >
> > > > > Thanks for that suggestion, that definately makes sense.
> > > > >
> > > > Does this make a big difference? How many Neutron - SB connections
are there?
> > > > What rings a bell is that Neutron is using the python OVSDB library
which hasn't implemented the fast-resync feature (if I remember correctly).
> > >
> > > python-ovs has supported monitor_cond_since since v2.17.0 (though
> > > there may have been a bug that was fixed in 2.17.1). If fast resync
> > > isn't happening, then it should be considered a bug. With that said, I
> > > remember when I looked it a year or two ago, ovsdb-server didn't
> > > really use fast resync/monitor_cond_since unless it was running in
> > > raft cluster mode (it would reply, but with the last-txn-id as 0
> > > IIRC?). Does the ovsdb-relay code actually return the last-txn-id? I
> > > can set up an environment and run some tests, but maybe someone else
> > > already knows.
> >
> > Looks like ovsdb-relay does support last-txn-id now:
> >
https://github.com/openvswitch/ovs/commit/a3e97b1af1bdcaa802c6caa9e73087df7077d2b1
,
> > but only in v3.0+.
> >
>
> Hi Terry, thanks for correcting me, and sorry for my bad memory! And you
are right that fast resync is supported only in cluster mode.
>
> Han
>
> > > > At the same time, there is the feature
leader-transfer-for-snapshot, which automatically transfer leader whenever
a snapshot is to be written, which would happen frequently if your
environment is very active.
> > >
> > > I believe snapshot should only be happening "no less frequently than
> > > 24 hours, with snapshots if there are more than 100 log entries and
> > > the log size has doubled, but no more frequently than every 10 mins"
> > > or something pretty close to that. So it seems like once the system
> > > got up to its expected size, you would just see updates every 24 hours
> > > since you obviously can't double in size forever. But it's possible
> > > I'm reading that wrong.
> > >
Sorry I forgot to comment on this. It is actually not this way. Suppose you
have a server with size N after snapshot/compaction, but there are tens of
transactions per second (delete, add, update, etc.), and then the log grows
quickly and it may take just several minutes to double the size of the log
(but not the actual DB data), so it will soon trigger a snapshot which will
reduce the log size back to N. So it is not uncommon to see snapshots every
10 minutes.

Regards,
Han

> > > > When a leader transfer happens, if Neutron set the option
"leader-only" (only connects to leader) to SB DB (could someone confirm?),
then when the leader transfer happens, 

Re: [ovs-discuss] Scaling OVN/Southbound

2023-07-07 Thread Han Zhou via discuss
On Thu, Jul 6, 2023 at 12:00 AM Felix Huettner 
wrote:
>
> Hi Han,
>
> On Fri, Jun 30, 2023 at 05:08:36PM -0700, Han Zhou wrote:
> > On Wed, May 24, 2023 at 12:26 AM Felix Huettner via discuss <
> > ovs-discuss@openvswitch.org> wrote:
> > >
> > > Hi Ilya,
> > >
> > > thank you for the detailed reply
> > >
> > > On Tue, May 23, 2023 at 05:25:49PM +0200, Ilya Maximets wrote:
> > > > On 5/23/23 15:59, Felix Hüttner via discuss wrote:
> > > > > Hi everyone,
> > > >
> > > > Hi, Felix.
> > > >
> > > > >
> > > > > we are currently running an OVN Deployment with 450 Nodes. We run
a 3
> > node cluster for the northbound database and a 3 nodes cluster for the
> > southbound database.
> > > > > Between the southbound cluster and the ovn-controllers we have a
> > layer of 24 ovsdb relays.
> > > > > The setup is using TLS for all connections, however the TLS
Server is
> > handled by a traefik reverseproxy to offload this from the ovsdb
> > > >
> > > > The very important part of the system description is what versions
> > > > of OVS and OVN are you using in this setup?  If it's not latest
> > > > 3.1 and 23.03, then it's hard to talk about what/if performance
> > > > improvements are actually needed.
> > > >
> > >
> > > We are currently running ovs 3.1 and ovn 22.12 (in the process of
> > > upgrading to 23.03). `monitor-all` is currently disabled, but we want
to
> > > try that as well.
> > >
> > Hi Felix, did you try upgrading and enabling "monitor-all"? How does it
> > look now?
>
> we did not yet upgrade, but we tried monitor-all and that provided a big
> benefit in terms of stability.
>
It is great to know that monitor-all helped for your use case.

> >
> > > > > Northd and Neutron is connecting directly to north- and southbound
> > databases without the relays.
> > > >
> > > > One of the big things that is annoying is that Neutron connects to
> > > > Southbound database at all.  There are some reasons to do that,
> > > > but ideally that should be avoided.  I know that in the past
limiting
> > > > the number of metadata agents was one of the mitigation strategies
> > > > for scaling issues.  Also, why can't it connect to relays?  There
> > > > shouldn't be too many transactions flowing towards Southbound DB
> > > > from the Neutron.
> > > >
> > >
> > > Thanks for that suggestion, that definately makes sense.
> > >
> > Does this make a big difference? How many Neutron - SB connections are
> > there?
> > What rings a bell is that Neutron is using the python OVSDB library
which
> > hasn't implemented the fast-resync feature (if I remember correctly).
> > At the same time, there is the feature leader-transfer-for-snapshot,
which
> > automatically transfer leader whenever a snapshot is to be written,
which
> > would happen frequently if your environment is very active.
> > When a leader transfer happens, if Neutron set the option "leader-only"
> > (only connects to leader) to SB DB (could someone confirm?), then when
the
> > leader transfer happens, all Neutron workers would reconnect to the new
> > leader. With fast-resync, like what's implemented in C IDL and Go, the
> > client that has cached the data would only request the delta when
> > reconnecting. But since the python lib doesn't have this, the Neutron
> > server would re-download full data when reconnecting ...
> > This is a speculation based on the information I have, and the
assumptions
> > need to be confirmed.
>
> We are currently working with upstream neutron to get the leader-only
> flag removed wherever we can. I guess in total the amount of connections
> depends on the process count which would be ~150 connections in total in
> our case.
>
As Terry pointed out that the python OVSDB lib does support fast-resync,
then this shouldn't be the problem and I think it is better to keep the
leader-only flag for neutron because it is more efficient to update
directly through the leader especially when the client writes heavily.
Without leader-only, the updates from different followers will have to
anyway go through the leader but parallel updates will result in sequence
conflict and will have to retry, which creates more waste and load to the
servers. But of course, it does harm to try. I didn't think that you have
so many (~150) connections just from Neutron (I thought it might be 10+),
which seems big enough to create significant load to the server, especially
when many of them restarts at the same time such as during an upgrade.

> >
> > > > >
> > > > > We needed to increase various timeouts on the ovsdb-server and
client
> > side to get this to a mostly stable state:
> > > > > * inactivity probes of 60 seconds (for all connections between
> > ovsdb-server, relay and clients)
> > > > > * cluster election time of 50 seconds
> > > > >
> > > > > As long as none of the relays restarts the environment is quite
> > stable.
> > > > > However we see quite regularly the "Unreasonably long xxx ms poll
> > interval" messages ranging from 1000ms up to 4ms.
> > > >
> > 

Re: [ovs-discuss] Scaling OVN/Southbound

2023-07-06 Thread Han Zhou via discuss
On Thu, Jul 6, 2023 at 1:28 AM Terry Wilson  wrote:
>
> On Wed, Jul 5, 2023 at 9:59 AM Terry Wilson  wrote:
> >
> > On Fri, Jun 30, 2023 at 7:09 PM Han Zhou via discuss
> >  wrote:
> > >
> > >
> > >
> > > On Wed, May 24, 2023 at 12:26 AM Felix Huettner via discuss <
ovs-discuss@openvswitch.org> wrote:
> > > >
> > > > Hi Ilya,
> > > >
> > > > thank you for the detailed reply
> > > >
> > > > On Tue, May 23, 2023 at 05:25:49PM +0200, Ilya Maximets wrote:
> > > > > On 5/23/23 15:59, Felix Hüttner via discuss wrote:
> > > > > > Hi everyone,
> > > > >
> > > > > Hi, Felix.
> > > > >
> > > > > >
> > > > > > we are currently running an OVN Deployment with 450 Nodes. We
run a 3 node cluster for the northbound database and a 3 nodes cluster for
the southbound database.
> > > > > > Between the southbound cluster and the ovn-controllers we have
a layer of 24 ovsdb relays.
> > > > > > The setup is using TLS for all connections, however the TLS
Server is handled by a traefik reverseproxy to offload this from the ovsdb
> > > > >
> > > > > The very important part of the system description is what versions
> > > > > of OVS and OVN are you using in this setup?  If it's not latest
> > > > > 3.1 and 23.03, then it's hard to talk about what/if performance
> > > > > improvements are actually needed.
> > > > >
> > > >
> > > > We are currently running ovs 3.1 and ovn 22.12 (in the process of
> > > > upgrading to 23.03). `monitor-all` is currently disabled, but we
want to
> > > > try that as well.
> > > >
> > > Hi Felix, did you try upgrading and enabling "monitor-all"? How does
it look now?
> > >
> > > > > > Northd and Neutron is connecting directly to north- and
southbound databases without the relays.
> > > > >
> > > > > One of the big things that is annoying is that Neutron connects to
> > > > > Southbound database at all.  There are some reasons to do that,
> > > > > but ideally that should be avoided.  I know that in the past
limiting
> > > > > the number of metadata agents was one of the mitigation strategies
> > > > > for scaling issues.  Also, why can't it connect to relays?  There
> > > > > shouldn't be too many transactions flowing towards Southbound DB
> > > > > from the Neutron.
> > > > >
> > > >
> > > > Thanks for that suggestion, that definately makes sense.
> > > >
> > > Does this make a big difference? How many Neutron - SB connections
are there?
> > > What rings a bell is that Neutron is using the python OVSDB library
which hasn't implemented the fast-resync feature (if I remember correctly).
> >
> > python-ovs has supported monitor_cond_since since v2.17.0 (though
> > there may have been a bug that was fixed in 2.17.1). If fast resync
> > isn't happening, then it should be considered a bug. With that said, I
> > remember when I looked it a year or two ago, ovsdb-server didn't
> > really use fast resync/monitor_cond_since unless it was running in
> > raft cluster mode (it would reply, but with the last-txn-id as 0
> > IIRC?). Does the ovsdb-relay code actually return the last-txn-id? I
> > can set up an environment and run some tests, but maybe someone else
> > already knows.
>
> Looks like ovsdb-relay does support last-txn-id now:
>
https://github.com/openvswitch/ovs/commit/a3e97b1af1bdcaa802c6caa9e73087df7077d2b1
,
> but only in v3.0+.
>

Hi Terry, thanks for correcting me, and sorry for my bad memory! And you
are right that fast resync is supported only in cluster mode.

Han

> > > At the same time, there is the feature leader-transfer-for-snapshot,
which automatically transfer leader whenever a snapshot is to be written,
which would happen frequently if your environment is very active.
> >
> > I believe snapshot should only be happening "no less frequently than
> > 24 hours, with snapshots if there are more than 100 log entries and
> > the log size has doubled, but no more frequently than every 10 mins"
> > or something pretty close to that. So it seems like once the system
> > got up to its expected size, you would just see updates every 24 hours
> > since you obviously can't double in size forever. But it's possible
> > I'm reading that wrong.
> >
> > > When a leader transfer happens, if Neutron set the option
"leader-only" (only connects to leader) to SB DB (could someone confirm?),
then when the leader transfer happens, all Neutron workers would reconnect
to the new leader. With fast-resync, like what's implemented in C IDL and
Go, the client that has cached the data would only request the delta when
reconnecting. But since the python lib doesn't have this, the Neutron
server would re-download full data when reconnecting ...
> > > This is a speculation based on the information I have, and the
assumptions need to be confirmed.
> > >
> > > > > >
> > > > > > We needed to increase various timeouts on the ovsdb-server and
client side to get this to a mostly stable state:
> > > > > > * inactivity probes of 60 seconds (for all connections between
ovsdb-server, relay and clients)
> > > > > > * cluster election 

Re: [ovs-discuss] Scaling OVN/Southbound

2023-07-05 Thread Terry Wilson via discuss
On Wed, Jul 5, 2023 at 9:59 AM Terry Wilson  wrote:
>
> On Fri, Jun 30, 2023 at 7:09 PM Han Zhou via discuss
>  wrote:
> >
> >
> >
> > On Wed, May 24, 2023 at 12:26 AM Felix Huettner via discuss 
> >  wrote:
> > >
> > > Hi Ilya,
> > >
> > > thank you for the detailed reply
> > >
> > > On Tue, May 23, 2023 at 05:25:49PM +0200, Ilya Maximets wrote:
> > > > On 5/23/23 15:59, Felix Hüttner via discuss wrote:
> > > > > Hi everyone,
> > > >
> > > > Hi, Felix.
> > > >
> > > > >
> > > > > we are currently running an OVN Deployment with 450 Nodes. We run a 3 
> > > > > node cluster for the northbound database and a 3 nodes cluster for 
> > > > > the southbound database.
> > > > > Between the southbound cluster and the ovn-controllers we have a 
> > > > > layer of 24 ovsdb relays.
> > > > > The setup is using TLS for all connections, however the TLS Server is 
> > > > > handled by a traefik reverseproxy to offload this from the ovsdb
> > > >
> > > > The very important part of the system description is what versions
> > > > of OVS and OVN are you using in this setup?  If it's not latest
> > > > 3.1 and 23.03, then it's hard to talk about what/if performance
> > > > improvements are actually needed.
> > > >
> > >
> > > We are currently running ovs 3.1 and ovn 22.12 (in the process of
> > > upgrading to 23.03). `monitor-all` is currently disabled, but we want to
> > > try that as well.
> > >
> > Hi Felix, did you try upgrading and enabling "monitor-all"? How does it 
> > look now?
> >
> > > > > Northd and Neutron is connecting directly to north- and southbound 
> > > > > databases without the relays.
> > > >
> > > > One of the big things that is annoying is that Neutron connects to
> > > > Southbound database at all.  There are some reasons to do that,
> > > > but ideally that should be avoided.  I know that in the past limiting
> > > > the number of metadata agents was one of the mitigation strategies
> > > > for scaling issues.  Also, why can't it connect to relays?  There
> > > > shouldn't be too many transactions flowing towards Southbound DB
> > > > from the Neutron.
> > > >
> > >
> > > Thanks for that suggestion, that definately makes sense.
> > >
> > Does this make a big difference? How many Neutron - SB connections are 
> > there?
> > What rings a bell is that Neutron is using the python OVSDB library which 
> > hasn't implemented the fast-resync feature (if I remember correctly).
>
> python-ovs has supported monitor_cond_since since v2.17.0 (though
> there may have been a bug that was fixed in 2.17.1). If fast resync
> isn't happening, then it should be considered a bug. With that said, I
> remember when I looked it a year or two ago, ovsdb-server didn't
> really use fast resync/monitor_cond_since unless it was running in
> raft cluster mode (it would reply, but with the last-txn-id as 0
> IIRC?). Does the ovsdb-relay code actually return the last-txn-id? I
> can set up an environment and run some tests, but maybe someone else
> already knows.

Looks like ovsdb-relay does support last-txn-id now:
https://github.com/openvswitch/ovs/commit/a3e97b1af1bdcaa802c6caa9e73087df7077d2b1,
but only in v3.0+.

> > At the same time, there is the feature leader-transfer-for-snapshot, which 
> > automatically transfer leader whenever a snapshot is to be written, which 
> > would happen frequently if your environment is very active.
>
> I believe snapshot should only be happening "no less frequently than
> 24 hours, with snapshots if there are more than 100 log entries and
> the log size has doubled, but no more frequently than every 10 mins"
> or something pretty close to that. So it seems like once the system
> got up to its expected size, you would just see updates every 24 hours
> since you obviously can't double in size forever. But it's possible
> I'm reading that wrong.
>
> > When a leader transfer happens, if Neutron set the option "leader-only" 
> > (only connects to leader) to SB DB (could someone confirm?), then when the 
> > leader transfer happens, all Neutron workers would reconnect to the new 
> > leader. With fast-resync, like what's implemented in C IDL and Go, the 
> > client that has cached the data would only request the delta when 
> > reconnecting. But since the python lib doesn't have this, the Neutron 
> > server would re-download full data when reconnecting ...
> > This is a speculation based on the information I have, and the assumptions 
> > need to be confirmed.
> >
> > > > >
> > > > > We needed to increase various timeouts on the ovsdb-server and client 
> > > > > side to get this to a mostly stable state:
> > > > > * inactivity probes of 60 seconds (for all connections between 
> > > > > ovsdb-server, relay and clients)
> > > > > * cluster election time of 50 seconds
> > > > >
> > > > > As long as none of the relays restarts the environment is quite 
> > > > > stable.
> > > > > However we see quite regularly the "Unreasonably long xxx ms poll 
> > > > > interval" messages ranging from 1000ms up 

Re: [ovs-discuss] Scaling OVN/Southbound

2023-07-05 Thread Felix Huettner via discuss
Hi Han,

On Fri, Jun 30, 2023 at 05:08:36PM -0700, Han Zhou wrote:
> On Wed, May 24, 2023 at 12:26 AM Felix Huettner via discuss <
> ovs-discuss@openvswitch.org> wrote:
> >
> > Hi Ilya,
> >
> > thank you for the detailed reply
> >
> > On Tue, May 23, 2023 at 05:25:49PM +0200, Ilya Maximets wrote:
> > > On 5/23/23 15:59, Felix Hüttner via discuss wrote:
> > > > Hi everyone,
> > >
> > > Hi, Felix.
> > >
> > > >
> > > > we are currently running an OVN Deployment with 450 Nodes. We run a 3
> node cluster for the northbound database and a 3 nodes cluster for the
> southbound database.
> > > > Between the southbound cluster and the ovn-controllers we have a
> layer of 24 ovsdb relays.
> > > > The setup is using TLS for all connections, however the TLS Server is
> handled by a traefik reverseproxy to offload this from the ovsdb
> > >
> > > The very important part of the system description is what versions
> > > of OVS and OVN are you using in this setup?  If it's not latest
> > > 3.1 and 23.03, then it's hard to talk about what/if performance
> > > improvements are actually needed.
> > >
> >
> > We are currently running ovs 3.1 and ovn 22.12 (in the process of
> > upgrading to 23.03). `monitor-all` is currently disabled, but we want to
> > try that as well.
> >
> Hi Felix, did you try upgrading and enabling "monitor-all"? How does it
> look now?

we did not yet upgrade, but we tried monitor-all and that provided a big
benefit in terms of stability.

>
> > > > Northd and Neutron is connecting directly to north- and southbound
> databases without the relays.
> > >
> > > One of the big things that is annoying is that Neutron connects to
> > > Southbound database at all.  There are some reasons to do that,
> > > but ideally that should be avoided.  I know that in the past limiting
> > > the number of metadata agents was one of the mitigation strategies
> > > for scaling issues.  Also, why can't it connect to relays?  There
> > > shouldn't be too many transactions flowing towards Southbound DB
> > > from the Neutron.
> > >
> >
> > Thanks for that suggestion, that definately makes sense.
> >
> Does this make a big difference? How many Neutron - SB connections are
> there?
> What rings a bell is that Neutron is using the python OVSDB library which
> hasn't implemented the fast-resync feature (if I remember correctly).
> At the same time, there is the feature leader-transfer-for-snapshot, which
> automatically transfer leader whenever a snapshot is to be written, which
> would happen frequently if your environment is very active.
> When a leader transfer happens, if Neutron set the option "leader-only"
> (only connects to leader) to SB DB (could someone confirm?), then when the
> leader transfer happens, all Neutron workers would reconnect to the new
> leader. With fast-resync, like what's implemented in C IDL and Go, the
> client that has cached the data would only request the delta when
> reconnecting. But since the python lib doesn't have this, the Neutron
> server would re-download full data when reconnecting ...
> This is a speculation based on the information I have, and the assumptions
> need to be confirmed.

We are currently working with upstream neutron to get the leader-only
flag removed wherever we can. I guess in total the amount of connections
depends on the process count which would be ~150 connections in total in
our case.

>
> > > >
> > > > We needed to increase various timeouts on the ovsdb-server and client
> side to get this to a mostly stable state:
> > > > * inactivity probes of 60 seconds (for all connections between
> ovsdb-server, relay and clients)
> > > > * cluster election time of 50 seconds
> > > >
> > > > As long as none of the relays restarts the environment is quite
> stable.
> > > > However we see quite regularly the "Unreasonably long xxx ms poll
> interval" messages ranging from 1000ms up to 4ms.
> > >
> > > With latest versions of OVS/OVN the CPU usage on Southbound DB
> > > servers without relays in our weekly 500-node ovn-heater runs
> > > stays below 10% during the test phase.  No large poll intervals
> > > are getting registered.
> > >
> > > Do you have more details on under which circumstances these
> > > large poll intervals occur?
> > >
> >
> > It seems to mostly happen on the initial connection of some client to
> > the ovsdb. From the few times we ran perf there it looks like the time
> > is spend in creating a monitor and during that sending out the updates
> > to the client side.
> >
> It is one of the worst case scenario for OVSDB when many clients initialize
> connections to it at the same time, when the size of data downloaded by
> each client is big.
> OVSDB relay, for what I understand, should greatly help on this. You have
> 24 relay nodes, which are supposed to share the burden. Are the SB DB and
> the relay instances running with sufficient CPU resources?
> Is it clear that initial connections from which clients (ovn-controller or
> Neutron) are causing this? If 

Re: [ovs-discuss] Scaling OVN/Southbound

2023-07-05 Thread Terry Wilson via discuss
On Fri, Jun 30, 2023 at 7:09 PM Han Zhou via discuss
 wrote:
>
>
>
> On Wed, May 24, 2023 at 12:26 AM Felix Huettner via discuss 
>  wrote:
> >
> > Hi Ilya,
> >
> > thank you for the detailed reply
> >
> > On Tue, May 23, 2023 at 05:25:49PM +0200, Ilya Maximets wrote:
> > > On 5/23/23 15:59, Felix Hüttner via discuss wrote:
> > > > Hi everyone,
> > >
> > > Hi, Felix.
> > >
> > > >
> > > > we are currently running an OVN Deployment with 450 Nodes. We run a 3 
> > > > node cluster for the northbound database and a 3 nodes cluster for the 
> > > > southbound database.
> > > > Between the southbound cluster and the ovn-controllers we have a layer 
> > > > of 24 ovsdb relays.
> > > > The setup is using TLS for all connections, however the TLS Server is 
> > > > handled by a traefik reverseproxy to offload this from the ovsdb
> > >
> > > The very important part of the system description is what versions
> > > of OVS and OVN are you using in this setup?  If it's not latest
> > > 3.1 and 23.03, then it's hard to talk about what/if performance
> > > improvements are actually needed.
> > >
> >
> > We are currently running ovs 3.1 and ovn 22.12 (in the process of
> > upgrading to 23.03). `monitor-all` is currently disabled, but we want to
> > try that as well.
> >
> Hi Felix, did you try upgrading and enabling "monitor-all"? How does it look 
> now?
>
> > > > Northd and Neutron is connecting directly to north- and southbound 
> > > > databases without the relays.
> > >
> > > One of the big things that is annoying is that Neutron connects to
> > > Southbound database at all.  There are some reasons to do that,
> > > but ideally that should be avoided.  I know that in the past limiting
> > > the number of metadata agents was one of the mitigation strategies
> > > for scaling issues.  Also, why can't it connect to relays?  There
> > > shouldn't be too many transactions flowing towards Southbound DB
> > > from the Neutron.
> > >
> >
> > Thanks for that suggestion, that definately makes sense.
> >
> Does this make a big difference? How many Neutron - SB connections are there?
> What rings a bell is that Neutron is using the python OVSDB library which 
> hasn't implemented the fast-resync feature (if I remember correctly).

python-ovs has supported monitor_cond_since since v2.17.0 (though
there may have been a bug that was fixed in 2.17.1). If fast resync
isn't happening, then it should be considered a bug. With that said, I
remember when I looked it a year or two ago, ovsdb-server didn't
really use fast resync/monitor_cond_since unless it was running in
raft cluster mode (it would reply, but with the last-txn-id as 0
IIRC?). Does the ovsdb-relay code actually return the last-txn-id? I
can set up an environment and run some tests, but maybe someone else
already knows.

> At the same time, there is the feature leader-transfer-for-snapshot, which 
> automatically transfer leader whenever a snapshot is to be written, which 
> would happen frequently if your environment is very active.

I believe snapshot should only be happening "no less frequently than
24 hours, with snapshots if there are more than 100 log entries and
the log size has doubled, but no more frequently than every 10 mins"
or something pretty close to that. So it seems like once the system
got up to its expected size, you would just see updates every 24 hours
since you obviously can't double in size forever. But it's possible
I'm reading that wrong.

> When a leader transfer happens, if Neutron set the option "leader-only" (only 
> connects to leader) to SB DB (could someone confirm?), then when the leader 
> transfer happens, all Neutron workers would reconnect to the new leader. With 
> fast-resync, like what's implemented in C IDL and Go, the client that has 
> cached the data would only request the delta when reconnecting. But since the 
> python lib doesn't have this, the Neutron server would re-download full data 
> when reconnecting ...
> This is a speculation based on the information I have, and the assumptions 
> need to be confirmed.
>
> > > >
> > > > We needed to increase various timeouts on the ovsdb-server and client 
> > > > side to get this to a mostly stable state:
> > > > * inactivity probes of 60 seconds (for all connections between 
> > > > ovsdb-server, relay and clients)
> > > > * cluster election time of 50 seconds
> > > >
> > > > As long as none of the relays restarts the environment is quite stable.
> > > > However we see quite regularly the "Unreasonably long xxx ms poll 
> > > > interval" messages ranging from 1000ms up to 4ms.
> > >
> > > With latest versions of OVS/OVN the CPU usage on Southbound DB
> > > servers without relays in our weekly 500-node ovn-heater runs
> > > stays below 10% during the test phase.  No large poll intervals
> > > are getting registered.
> > >
> > > Do you have more details on under which circumstances these
> > > large poll intervals occur?
> > >
> >
> > It seems to mostly happen on the initial 

Re: [ovs-discuss] Scaling OVN/Southbound

2023-06-30 Thread Han Zhou via discuss
On Wed, May 24, 2023 at 12:26 AM Felix Huettner via discuss <
ovs-discuss@openvswitch.org> wrote:
>
> Hi Ilya,
>
> thank you for the detailed reply
>
> On Tue, May 23, 2023 at 05:25:49PM +0200, Ilya Maximets wrote:
> > On 5/23/23 15:59, Felix Hüttner via discuss wrote:
> > > Hi everyone,
> >
> > Hi, Felix.
> >
> > >
> > > we are currently running an OVN Deployment with 450 Nodes. We run a 3
node cluster for the northbound database and a 3 nodes cluster for the
southbound database.
> > > Between the southbound cluster and the ovn-controllers we have a
layer of 24 ovsdb relays.
> > > The setup is using TLS for all connections, however the TLS Server is
handled by a traefik reverseproxy to offload this from the ovsdb
> >
> > The very important part of the system description is what versions
> > of OVS and OVN are you using in this setup?  If it's not latest
> > 3.1 and 23.03, then it's hard to talk about what/if performance
> > improvements are actually needed.
> >
>
> We are currently running ovs 3.1 and ovn 22.12 (in the process of
> upgrading to 23.03). `monitor-all` is currently disabled, but we want to
> try that as well.
>
Hi Felix, did you try upgrading and enabling "monitor-all"? How does it
look now?

> > > Northd and Neutron is connecting directly to north- and southbound
databases without the relays.
> >
> > One of the big things that is annoying is that Neutron connects to
> > Southbound database at all.  There are some reasons to do that,
> > but ideally that should be avoided.  I know that in the past limiting
> > the number of metadata agents was one of the mitigation strategies
> > for scaling issues.  Also, why can't it connect to relays?  There
> > shouldn't be too many transactions flowing towards Southbound DB
> > from the Neutron.
> >
>
> Thanks for that suggestion, that definately makes sense.
>
Does this make a big difference? How many Neutron - SB connections are
there?
What rings a bell is that Neutron is using the python OVSDB library which
hasn't implemented the fast-resync feature (if I remember correctly).
At the same time, there is the feature leader-transfer-for-snapshot, which
automatically transfer leader whenever a snapshot is to be written, which
would happen frequently if your environment is very active.
When a leader transfer happens, if Neutron set the option "leader-only"
(only connects to leader) to SB DB (could someone confirm?), then when the
leader transfer happens, all Neutron workers would reconnect to the new
leader. With fast-resync, like what's implemented in C IDL and Go, the
client that has cached the data would only request the delta when
reconnecting. But since the python lib doesn't have this, the Neutron
server would re-download full data when reconnecting ...
This is a speculation based on the information I have, and the assumptions
need to be confirmed.

> > >
> > > We needed to increase various timeouts on the ovsdb-server and client
side to get this to a mostly stable state:
> > > * inactivity probes of 60 seconds (for all connections between
ovsdb-server, relay and clients)
> > > * cluster election time of 50 seconds
> > >
> > > As long as none of the relays restarts the environment is quite
stable.
> > > However we see quite regularly the "Unreasonably long xxx ms poll
interval" messages ranging from 1000ms up to 4ms.
> >
> > With latest versions of OVS/OVN the CPU usage on Southbound DB
> > servers without relays in our weekly 500-node ovn-heater runs
> > stays below 10% during the test phase.  No large poll intervals
> > are getting registered.
> >
> > Do you have more details on under which circumstances these
> > large poll intervals occur?
> >
>
> It seems to mostly happen on the initial connection of some client to
> the ovsdb. From the few times we ran perf there it looks like the time
> is spend in creating a monitor and during that sending out the updates
> to the client side.
>
It is one of the worst case scenario for OVSDB when many clients initialize
connections to it at the same time, when the size of data downloaded by
each client is big.
OVSDB relay, for what I understand, should greatly help on this. You have
24 relay nodes, which are supposed to share the burden. Are the SB DB and
the relay instances running with sufficient CPU resources?
Is it clear that initial connections from which clients (ovn-controller or
Neutron) are causing this? If it is Neutron, the above speculation about
the lack of fast-resync from Neutron workers may be worth checking.

> If it is of interest i can try and get a perf report once this occurs
> again.
>
> > >
> > > If a large amount of relays restart simultaneously they can also
bring the ovsdb cluster to fail as the poll interval exceeds the cluster
election time.
> > > This happens with the relays already syncing the data from all 3
ovsdb servers.
> >
> > There was a performance issue with upgrades and simultaneous
> > reconnections, but it should be mostly fixed on the current master
> > branch, 

Re: [ovs-discuss] Scaling OVN/Southbound

2023-05-24 Thread Lucas Alvares Gomes via discuss
Hi,

On Tue, May 23, 2023 at 5:42 PM Daniel Alvarez via discuss
 wrote:
>
> +Lucas
>
> > On 23 May 2023, at 17:25, Ilya Maximets via discuss 
> >  wrote:
> >
> > On 5/23/23 15:59, Felix Hüttner via discuss wrote:
> >> Hi everyone,
> >
> > Hi, Felix.
> >
> >>
> >> we are currently running an OVN Deployment with 450 Nodes. We run a 3 node 
> >> cluster for the northbound database and a 3 nodes cluster for the 
> >> southbound database.
> >> Between the southbound cluster and the ovn-controllers we have a layer of 
> >> 24 ovsdb relays.
> >> The setup is using TLS for all connections, however the TLS Server is 
> >> handled by a traefik reverseproxy to offload this from the ovsdb
> >
> > The very important part of the system description is what versions
> > of OVS and OVN are you using in this setup?  If it's not latest
> > 3.1 and 23.03, then it's hard to talk about what/if performance
> > improvements are actually needed.
> >
> >> Northd and Neutron is connecting directly to north- and southbound 
> >> databases without the relays.
> >
> > One of the big things that is annoying is that Neutron connects to
> > Southbound database at all.  There are some reasons to do that,
> > but ideally that should be avoided.
>
> We initiated an effort to connect only to the NB database. Lucas (CC’ed) is 
> working on it at the moment because the main piece of info we are missing is 
> the location of the ports. With this, we can probably stop connecting to the 
> SB database but we will move part of the problem to the NB (less of it 
> hopefully).
>

Thanks for raising this, Daniel.

The thread mentioned is this one:
https://mail.openvswitch.org/pipermail/ovs-dev/2023-April/403635.html,
the conversation in this thread is relevant to this topic I believe.

With the idea above, we would be able to avoid ovn-bgp-agent [0]
connecting to the Southbound Database, the gateway port location
information is the last piece we need for that project.

As for Neutron itself, we still have some more work to do, especially
around the Chassis table. For example, at the moment it's the CMS job
to clean up orphan Chassis entries in the Southbound, ovn-controller
will not delete the record unless it was gracefully stopped (which is
not always the case, specially during hard failures).

Another example is the configuration passed to the CMS via the
"ovn-cms-options" which is only exposed in the Chassis table at the
moment. That's how we get information about GW nodes
("enable-chassis-as-gw"), AZs, etc...

We also do a few things around the Port_Binding/Datapath table (we
look for GW ports, local ports for the metadata agent, etc...). The
problem with the physical location of ports could indeed be solved by
expanding the work from the thread about. In the thread, Han mentioned
that perhaps we could explore having a "status/detail" column in the
LSPs that would hold this type information (hosting chassis, port
up/down, etc...) which CMS could consume. The more I think about it, I
think it's a great idea for CMSs.

[0] https://opendev.org/openstack/ovn-bgp-agent

>
>
> >  I know that in the past limiting
> > the number of metadata agents was one of the mitigation strategies
> > for scaling issues.  Also, why can't it connect to relays?  There
> > shouldn't be too many transactions flowing towards Southbound DB
> > from the Neutron.
> >
> >>
> >> We needed to increase various timeouts on the ovsdb-server and client side 
> >> to get this to a mostly stable state:
> >> * inactivity probes of 60 seconds (for all connections between 
> >> ovsdb-server, relay and clients)
> >> * cluster election time of 50 seconds
> >>
> >> As long as none of the relays restarts the environment is quite stable.
> >> However we see quite regularly the "Unreasonably long xxx ms poll 
> >> interval" messages ranging from 1000ms up to 4ms.
> >
> > With latest versions of OVS/OVN the CPU usage on Southbound DB
> > servers without relays in our weekly 500-node ovn-heater runs
> > stays below 10% during the test phase.  No large poll intervals
> > are getting registered.
> >
> > Do you have more details on under which circumstances these
> > large poll intervals occur?
> >
> >>
> >> If a large amount of relays restart simultaneously they can also bring the 
> >> ovsdb cluster to fail as the poll interval exceeds the cluster election 
> >> time.
> >> This happens with the relays already syncing the data from all 3 ovsdb 
> >> servers.
> >
> > There was a performance issue with upgrades and simultaneous
> > reconnections, but it should be mostly fixed on the current master
> > branch, i.e. in the upcoming 3.2 release:
> >  
> > https://patchwork.ozlabs.org/project/openvswitch/list/?series=348259=*
> >
> >>
> >> We would like to improve this significantly to ensure on the one hand that 
> >> our ovsdb clusters will survive unplanned load without issues and on the 
> >> other hand to keep the poll intervals short.
> >> We would like to ensure a short poll interval to 

Re: [ovs-discuss] Scaling OVN/Southbound

2023-05-24 Thread Felix Huettner via discuss
Hi Dan,

On Tue, May 23, 2023 at 09:13:03AM -0500, Dan Williams wrote:
> On Tue, 2023-05-23 at 13:59 +, Felix Hüttner via discuss wrote:
> > Hi everyone,
> >
> > we are currently running an OVN Deployment with 450 Nodes. We run a 3
> > node cluster for the northbound database and a 3 nodes cluster for
> > the southbound database.
> > Between the southbound cluster and the ovn-controllers we have a
> > layer of 24 ovsdb relays.
> > The setup is using TLS for all connections, however the TLS Server is
> > handled by a traefik reverseproxy to offload this from the ovsdb
> > Northd and Neutron is connecting directly to north- and southbound
> > databases without the relays.
> >
> > We needed to increase various timeouts on the ovsdb-server and client
> > side to get this to a mostly stable state:
> > * inactivity probes of 60 seconds (for all connections between ovsdb-
> > server, relay and clients)
> > * cluster election time of 50 seconds
> >
> > As long as none of the relays restarts the environment is quite
> > stable.
> > However we see quite regularly the "Unreasonably long xxx ms poll
> > interval" messages ranging from 1000ms up to 4ms.
>
> I probably missed it from previous messages, but:
>
> 1) are your ovn-controllers using conditional monitoring for the SB, or
> monitor-all?
>
> 2) what OVS version are your DB servers?
>

We are using conditional monitoring, but will explore monitor-all soon.
We are running 3.1 for ovs and 22.12 for ovn.

Thanks
Felix

> Dan
>
> >
> > If a large amount of relays restart simultaneously they can also
> > bring the ovsdb cluster to fail as the poll interval exceeds the
> > cluster election time.
> > This happens with the relays already syncing the data from all 3
> > ovsdb servers.
> >
> > We would like to improve this significantly to ensure on the one hand
> > that our ovsdb clusters will survive unplanned load without issues
> > and on the other hand to keep the poll intervals short.
> > We would like to ensure a short poll interval to allow us to act on
> > distributed-gateway-ports failovers and failover of virtual port in a
> > timely manner (ideally below 1 second).
> >
> > To do this we found the following solutions that were discussed in
> > the past:
> > 1. Implementing multithreading for ovsdb
> > https://patchwork.ozlabs.org/project/openvswitch/list/?series===*=multithreading==
> > 2. Changing the storage backend of OVN to an alternative (e.g. etcd)
> > https://mail.openvswitch.org/pipermail/ovs-discuss/2016-July/041733.html
> >
> > Both of these discussion are from 2016, not sure if more up-to-date
> > ones exist.
> >
> > I would like to ask if there are already existing discussions on
> > scaling ovsdb further/faster?
> >
> > > From my perspective whatever such a solution might be, would no
> > > longer require relays and allow the ovsdb servers to handle load
> > > gracefully.
> > I personally see that multithreading for ovsdb sounds quite
> > promising, as that would allow us to separate the raft/cluster
> > communication from the client connections.
> > This should allow us to keep the cluster healthly even under
> > significant pressure of clients.
> >
> > Thank you
> >
> > --
> > Felix Huettner
> >
> > discuss mailing list
> > disc...@openvswitch.org
> > https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
> >
> >
>
Diese E Mail enthält möglicherweise vertrauliche Inhalte und ist nur für die 
Verwertung durch den vorgesehenen Empfänger bestimmt.
Sollten Sie nicht der vorgesehene Empfänger sein, setzen Sie den Absender bitte 
unverzüglich in Kenntnis und löschen diese E Mail.

Hinweise zum Datenschutz finden Sie hier.


This e-mail may contain confidential content and is intended only for the 
specified recipient/s.
If you are not the intended recipient, please inform the sender immediately and 
delete this e-mail.

Information on data protection can be found 
here.
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] Scaling OVN/Southbound

2023-05-24 Thread Felix Huettner via discuss
Hi Ilya,

thank you for the detailed reply

On Tue, May 23, 2023 at 05:25:49PM +0200, Ilya Maximets wrote:
> On 5/23/23 15:59, Felix Hüttner via discuss wrote:
> > Hi everyone,
>
> Hi, Felix.
>
> >
> > we are currently running an OVN Deployment with 450 Nodes. We run a 3 node 
> > cluster for the northbound database and a 3 nodes cluster for the 
> > southbound database.
> > Between the southbound cluster and the ovn-controllers we have a layer of 
> > 24 ovsdb relays.
> > The setup is using TLS for all connections, however the TLS Server is 
> > handled by a traefik reverseproxy to offload this from the ovsdb
>
> The very important part of the system description is what versions
> of OVS and OVN are you using in this setup?  If it's not latest
> 3.1 and 23.03, then it's hard to talk about what/if performance
> improvements are actually needed.
>

We are currently running ovs 3.1 and ovn 22.12 (in the process of
upgrading to 23.03). `monitor-all` is currently disabled, but we want to
try that as well.

> > Northd and Neutron is connecting directly to north- and southbound 
> > databases without the relays.
>
> One of the big things that is annoying is that Neutron connects to
> Southbound database at all.  There are some reasons to do that,
> but ideally that should be avoided.  I know that in the past limiting
> the number of metadata agents was one of the mitigation strategies
> for scaling issues.  Also, why can't it connect to relays?  There
> shouldn't be too many transactions flowing towards Southbound DB
> from the Neutron.
>

Thanks for that suggestion, that definately makes sense.

> >
> > We needed to increase various timeouts on the ovsdb-server and client side 
> > to get this to a mostly stable state:
> > * inactivity probes of 60 seconds (for all connections between 
> > ovsdb-server, relay and clients)
> > * cluster election time of 50 seconds
> >
> > As long as none of the relays restarts the environment is quite stable.
> > However we see quite regularly the "Unreasonably long xxx ms poll interval" 
> > messages ranging from 1000ms up to 4ms.
>
> With latest versions of OVS/OVN the CPU usage on Southbound DB
> servers without relays in our weekly 500-node ovn-heater runs
> stays below 10% during the test phase.  No large poll intervals
> are getting registered.
>
> Do you have more details on under which circumstances these
> large poll intervals occur?
>

It seems to mostly happen on the initial connection of some client to
the ovsdb. From the few times we ran perf there it looks like the time
is spend in creating a monitor and during that sending out the updates
to the client side.

If it is of interest i can try and get a perf report once this occurs
again.

> >
> > If a large amount of relays restart simultaneously they can also bring the 
> > ovsdb cluster to fail as the poll interval exceeds the cluster election 
> > time.
> > This happens with the relays already syncing the data from all 3 ovsdb 
> > servers.
>
> There was a performance issue with upgrades and simultaneous
> reconnections, but it should be mostly fixed on the current master
> branch, i.e. in the upcoming 3.2 release:
>   https://patchwork.ozlabs.org/project/openvswitch/list/?series=348259=*
>

That sounds like that might be similar to when our issue occurs. I'll
see if we can try this out.

> >
> > We would like to improve this significantly to ensure on the one hand that 
> > our ovsdb clusters will survive unplanned load without issues and on the 
> > other hand to keep the poll intervals short.
> > We would like to ensure a short poll interval to allow us to act on 
> > distributed-gateway-ports failovers and failover of virtual port in a 
> > timely manner (ideally below 1 second).
>
> These are good goals.  But are you sure they are not already
> addressed with the most recent versions of OVS/OVN ?
>

I was not sure, but all your feedback helped clarifying that.

> >
> > To do this we found the following solutions that were discussed in the past:
> > 1. Implementing multithreading for ovsdb 
> > https://patchwork.ozlabs.org/project/openvswitch/list/?series===*=multithreading==
>
> We moved the compaction process to a separate thread in 3.0.
> This partially addressed the multi-threading topic.  General
> handling of client requests/updates in separate threads will
> require significant changes in the internal architecture, AFAICT.
> So, I'd like to avoid doing that unless necessary.  So far we
> were able to overcome almost all the performance challenges
> with simple algorithmic changes instead.
>

I definately get that since that would be quite a complex change to do.
The only benefit i would see in having clients in separate threads is
that it reduces the impact of performance challenges.
E.g. it would still allow the cluster to healthly work together and make
progress, but individual reconnects would be slow.

That benefit would be quite significant from my perspective as it makes
the solution more 

Re: [ovs-discuss] Scaling OVN/Southbound

2023-05-23 Thread Daniel Alvarez via discuss
+Lucas

> On 23 May 2023, at 17:25, Ilya Maximets via discuss 
>  wrote:
> 
> On 5/23/23 15:59, Felix Hüttner via discuss wrote:
>> Hi everyone,
> 
> Hi, Felix.
> 
>> 
>> we are currently running an OVN Deployment with 450 Nodes. We run a 3 node 
>> cluster for the northbound database and a 3 nodes cluster for the southbound 
>> database.
>> Between the southbound cluster and the ovn-controllers we have a layer of 24 
>> ovsdb relays.
>> The setup is using TLS for all connections, however the TLS Server is 
>> handled by a traefik reverseproxy to offload this from the ovsdb
> 
> The very important part of the system description is what versions
> of OVS and OVN are you using in this setup?  If it's not latest
> 3.1 and 23.03, then it's hard to talk about what/if performance
> improvements are actually needed.
> 
>> Northd and Neutron is connecting directly to north- and southbound databases 
>> without the relays.
> 
> One of the big things that is annoying is that Neutron connects to
> Southbound database at all.  There are some reasons to do that,
> but ideally that should be avoided.

We initiated an effort to connect only to the NB database. Lucas (CC’ed) is 
working on it at the moment because the main piece of info we are missing is 
the location of the ports. With this, we can probably stop connecting to the SB 
database but we will move part of the problem to the NB (less of it hopefully).


>  I know that in the past limiting
> the number of metadata agents was one of the mitigation strategies
> for scaling issues.  Also, why can't it connect to relays?  There
> shouldn't be too many transactions flowing towards Southbound DB
> from the Neutron.
> 
>> 
>> We needed to increase various timeouts on the ovsdb-server and client side 
>> to get this to a mostly stable state:
>> * inactivity probes of 60 seconds (for all connections between ovsdb-server, 
>> relay and clients)
>> * cluster election time of 50 seconds
>> 
>> As long as none of the relays restarts the environment is quite stable.
>> However we see quite regularly the "Unreasonably long xxx ms poll interval" 
>> messages ranging from 1000ms up to 4ms.
> 
> With latest versions of OVS/OVN the CPU usage on Southbound DB
> servers without relays in our weekly 500-node ovn-heater runs
> stays below 10% during the test phase.  No large poll intervals
> are getting registered.
> 
> Do you have more details on under which circumstances these
> large poll intervals occur?
> 
>> 
>> If a large amount of relays restart simultaneously they can also bring the 
>> ovsdb cluster to fail as the poll interval exceeds the cluster election time.
>> This happens with the relays already syncing the data from all 3 ovsdb 
>> servers.
> 
> There was a performance issue with upgrades and simultaneous
> reconnections, but it should be mostly fixed on the current master
> branch, i.e. in the upcoming 3.2 release:
>  https://patchwork.ozlabs.org/project/openvswitch/list/?series=348259=*
> 
>> 
>> We would like to improve this significantly to ensure on the one hand that 
>> our ovsdb clusters will survive unplanned load without issues and on the 
>> other hand to keep the poll intervals short.
>> We would like to ensure a short poll interval to allow us to act on 
>> distributed-gateway-ports failovers and failover of virtual port in a timely 
>> manner (ideally below 1 second).
> 
> These are good goals.  But are you sure they are not already
> addressed with the most recent versions of OVS/OVN ?
> 
>> 
>> To do this we found the following solutions that were discussed in the past:
>> 1. Implementing multithreading for ovsdb 
>> https://patchwork.ozlabs.org/project/openvswitch/list/?series===*=multithreading==
> 
> We moved the compaction process to a separate thread in 3.0.
> This partially addressed the multi-threading topic.  General
> handling of client requests/updates in separate threads will
> require significant changes in the internal architecture, AFAICT.
> So, I'd like to avoid doing that unless necessary.  So far we
> were able to overcome almost all the performance challenges
> with simple algorithmic changes instead.
> 
>> 2. Changing the storage backend of OVN to an alternative (e.g. etcd) 
>> https://mail.openvswitch.org/pipermail/ovs-discuss/2016-July/041733.html
> 
> There was an ovsdb-etcd project, but it didn't manage to provide
> better performance in comparison with ovsdb-server.  So it was
> ultimately abandoned: https://github.com/IBM/ovsdb-etcd
> 
>> 
>> Both of these discussion are from 2016, not sure if more up-to-date ones 
>> exist.
>> 
>> I would like to ask if there are already existing discussions on scaling 
>> ovsdb further/faster?
> 
> This again comes to a question what versions you're using.  I'm
> currently not aware of any major performance issues for ovsdb-server
> on the most recent code, besides the conditional monitoring, which is
> not entirely OVSDB server's issue.  And it is also likely to become
> a bit better 

Re: [ovs-discuss] Scaling OVN/Southbound

2023-05-23 Thread Dan Williams via discuss
On Tue, 2023-05-23 at 13:59 +, Felix Hüttner via discuss wrote:
> Hi everyone,
> 
> we are currently running an OVN Deployment with 450 Nodes. We run a 3
> node cluster for the northbound database and a 3 nodes cluster for
> the southbound database.
> Between the southbound cluster and the ovn-controllers we have a
> layer of 24 ovsdb relays.
> The setup is using TLS for all connections, however the TLS Server is
> handled by a traefik reverseproxy to offload this from the ovsdb
> Northd and Neutron is connecting directly to north- and southbound
> databases without the relays.
> 
> We needed to increase various timeouts on the ovsdb-server and client
> side to get this to a mostly stable state:
> * inactivity probes of 60 seconds (for all connections between ovsdb-
> server, relay and clients)
> * cluster election time of 50 seconds
> 
> As long as none of the relays restarts the environment is quite
> stable.
> However we see quite regularly the "Unreasonably long xxx ms poll
> interval" messages ranging from 1000ms up to 4ms.

I probably missed it from previous messages, but:

1) are your ovn-controllers using conditional monitoring for the SB, or
monitor-all?

2) what OVS version are your DB servers?

Dan

> 
> If a large amount of relays restart simultaneously they can also
> bring the ovsdb cluster to fail as the poll interval exceeds the
> cluster election time.
> This happens with the relays already syncing the data from all 3
> ovsdb servers.
> 
> We would like to improve this significantly to ensure on the one hand
> that our ovsdb clusters will survive unplanned load without issues
> and on the other hand to keep the poll intervals short.
> We would like to ensure a short poll interval to allow us to act on
> distributed-gateway-ports failovers and failover of virtual port in a
> timely manner (ideally below 1 second).
> 
> To do this we found the following solutions that were discussed in
> the past:
> 1. Implementing multithreading for ovsdb
> https://patchwork.ozlabs.org/project/openvswitch/list/?series===*=multithreading==
> 2. Changing the storage backend of OVN to an alternative (e.g. etcd)
> https://mail.openvswitch.org/pipermail/ovs-discuss/2016-July/041733.html
> 
> Both of these discussion are from 2016, not sure if more up-to-date
> ones exist.
> 
> I would like to ask if there are already existing discussions on
> scaling ovsdb further/faster?
> 
> > From my perspective whatever such a solution might be, would no
> > longer require relays and allow the ovsdb servers to handle load
> > gracefully.
> I personally see that multithreading for ovsdb sounds quite
> promising, as that would allow us to separate the raft/cluster
> communication from the client connections.
> This should allow us to keep the cluster healthly even under
> significant pressure of clients.
> 
> Thank you
> 
> --
> Felix Huettner
> 
> Diese E Mail enthält möglicherweise vertrauliche Inhalte und ist nur
> für die Verwertung durch den vorgesehenen Empfänger bestimmt.
> Sollten Sie nicht der vorgesehene Empfänger sein, setzen Sie den
> Absender bitte unverzüglich in Kenntnis und löschen diese E Mail.
> 
> Hinweise zum Datenschutz finden Sie
> hier.
> 
> 
> This e-mail may contain confidential content and is intended only for
> the specified recipient/s.
> If you are not the intended recipient, please inform the sender
> immediately and delete this e-mail.
> 
> Information on data protection can be found
> here.
> ___
> discuss mailing list
> disc...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
> 
> 

___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] Scaling OVN/Southbound

2023-05-23 Thread Ilya Maximets via discuss
On 5/23/23 15:59, Felix Hüttner via discuss wrote:
> Hi everyone,

Hi, Felix.

> 
> we are currently running an OVN Deployment with 450 Nodes. We run a 3 node 
> cluster for the northbound database and a 3 nodes cluster for the southbound 
> database.
> Between the southbound cluster and the ovn-controllers we have a layer of 24 
> ovsdb relays.
> The setup is using TLS for all connections, however the TLS Server is handled 
> by a traefik reverseproxy to offload this from the ovsdb

The very important part of the system description is what versions
of OVS and OVN are you using in this setup?  If it's not latest
3.1 and 23.03, then it's hard to talk about what/if performance
improvements are actually needed.

> Northd and Neutron is connecting directly to north- and southbound databases 
> without the relays.

One of the big things that is annoying is that Neutron connects to
Southbound database at all.  There are some reasons to do that,
but ideally that should be avoided.  I know that in the past limiting
the number of metadata agents was one of the mitigation strategies
for scaling issues.  Also, why can't it connect to relays?  There
shouldn't be too many transactions flowing towards Southbound DB
from the Neutron.

> 
> We needed to increase various timeouts on the ovsdb-server and client side to 
> get this to a mostly stable state:
> * inactivity probes of 60 seconds (for all connections between ovsdb-server, 
> relay and clients)
> * cluster election time of 50 seconds
> 
> As long as none of the relays restarts the environment is quite stable.
> However we see quite regularly the "Unreasonably long xxx ms poll interval" 
> messages ranging from 1000ms up to 4ms.

With latest versions of OVS/OVN the CPU usage on Southbound DB
servers without relays in our weekly 500-node ovn-heater runs
stays below 10% during the test phase.  No large poll intervals
are getting registered.

Do you have more details on under which circumstances these
large poll intervals occur?

> 
> If a large amount of relays restart simultaneously they can also bring the 
> ovsdb cluster to fail as the poll interval exceeds the cluster election time.
> This happens with the relays already syncing the data from all 3 ovsdb 
> servers.

There was a performance issue with upgrades and simultaneous
reconnections, but it should be mostly fixed on the current master
branch, i.e. in the upcoming 3.2 release:
  https://patchwork.ozlabs.org/project/openvswitch/list/?series=348259=*

> 
> We would like to improve this significantly to ensure on the one hand that 
> our ovsdb clusters will survive unplanned load without issues and on the 
> other hand to keep the poll intervals short.
> We would like to ensure a short poll interval to allow us to act on 
> distributed-gateway-ports failovers and failover of virtual port in a timely 
> manner (ideally below 1 second).

These are good goals.  But are you sure they are not already
addressed with the most recent versions of OVS/OVN ?

> 
> To do this we found the following solutions that were discussed in the past:
> 1. Implementing multithreading for ovsdb 
> https://patchwork.ozlabs.org/project/openvswitch/list/?series===*=multithreading==

We moved the compaction process to a separate thread in 3.0.
This partially addressed the multi-threading topic.  General
handling of client requests/updates in separate threads will
require significant changes in the internal architecture, AFAICT.
So, I'd like to avoid doing that unless necessary.  So far we
were able to overcome almost all the performance challenges
with simple algorithmic changes instead.

> 2. Changing the storage backend of OVN to an alternative (e.g. etcd) 
> https://mail.openvswitch.org/pipermail/ovs-discuss/2016-July/041733.html

There was an ovsdb-etcd project, but it didn't manage to provide
better performance in comparison with ovsdb-server.  So it was
ultimately abandoned: https://github.com/IBM/ovsdb-etcd

> 
> Both of these discussion are from 2016, not sure if more up-to-date ones 
> exist.
> 
> I would like to ask if there are already existing discussions on scaling 
> ovsdb further/faster?

This again comes to a question what versions you're using.  I'm
currently not aware of any major performance issues for ovsdb-server
on the most recent code, besides the conditional monitoring, which is
not entirely OVSDB server's issue.  And it is also likely to become
a bit better in 3.2:
  
https://patchwork.ozlabs.org/project/openvswitch/patch/20230518121425.550048-1-i.maxim...@ovn.org/

> 
> From my perspective whatever such a solution might be, would no longer 
> require relays and allow the ovsdb servers to handle load gracefully.
> I personally see that multithreading for ovsdb sounds quite promising, as 
> that would allow us to separate the raft/cluster communication from the 
> client connections.
> This should allow us to keep the cluster healthly even under significant 
> pressure of clients.

Again, good goals.  I'm 

[ovs-discuss] Scaling OVN/Southbound

2023-05-23 Thread Felix Hüttner via discuss
Hi everyone,

we are currently running an OVN Deployment with 450 Nodes. We run a 3 node 
cluster for the northbound database and a 3 nodes cluster for the southbound 
database.
Between the southbound cluster and the ovn-controllers we have a layer of 24 
ovsdb relays.
The setup is using TLS for all connections, however the TLS Server is handled 
by a traefik reverseproxy to offload this from the ovsdb
Northd and Neutron is connecting directly to north- and southbound databases 
without the relays.

We needed to increase various timeouts on the ovsdb-server and client side to 
get this to a mostly stable state:
* inactivity probes of 60 seconds (for all connections between ovsdb-server, 
relay and clients)
* cluster election time of 50 seconds

As long as none of the relays restarts the environment is quite stable.
However we see quite regularly the "Unreasonably long xxx ms poll interval" 
messages ranging from 1000ms up to 4ms.

If a large amount of relays restart simultaneously they can also bring the 
ovsdb cluster to fail as the poll interval exceeds the cluster election time.
This happens with the relays already syncing the data from all 3 ovsdb servers.

We would like to improve this significantly to ensure on the one hand that our 
ovsdb clusters will survive unplanned load without issues and on the other hand 
to keep the poll intervals short.
We would like to ensure a short poll interval to allow us to act on 
distributed-gateway-ports failovers and failover of virtual port in a timely 
manner (ideally below 1 second).

To do this we found the following solutions that were discussed in the past:
1. Implementing multithreading for ovsdb 
https://patchwork.ozlabs.org/project/openvswitch/list/?series===*=multithreading==
2. Changing the storage backend of OVN to an alternative (e.g. etcd) 
https://mail.openvswitch.org/pipermail/ovs-discuss/2016-July/041733.html

Both of these discussion are from 2016, not sure if more up-to-date ones exist.

I would like to ask if there are already existing discussions on scaling ovsdb 
further/faster?

>From my perspective whatever such a solution might be, would no longer require 
>relays and allow the ovsdb servers to handle load gracefully.
I personally see that multithreading for ovsdb sounds quite promising, as that 
would allow us to separate the raft/cluster communication from the client 
connections.
This should allow us to keep the cluster healthly even under significant 
pressure of clients.

Thank you

--
Felix Huettner

Diese E Mail enthält möglicherweise vertrauliche Inhalte und ist nur für die 
Verwertung durch den vorgesehenen Empfänger bestimmt.
Sollten Sie nicht der vorgesehene Empfänger sein, setzen Sie den Absender bitte 
unverzüglich in Kenntnis und löschen diese E Mail.

Hinweise zum Datenschutz finden Sie hier.


This e-mail may contain confidential content and is intended only for the 
specified recipient/s.
If you are not the intended recipient, please inform the sender immediately and 
delete this e-mail.

Information on data protection can be found 
here.
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss