Re: [ovs-discuss] [ovn] recommended upgrade/restart procedure for ovn components

2021-08-19 Thread Numan Siddique
On Wed, Aug 18, 2021 at 1:47 PM Krzysztof Klimonda
 wrote:
>
> Hi Numan,
>
> On Wed, Aug 18, 2021, at 17:42, Numan Siddique wrote:
> > On Wed, Aug 18, 2021 at 3:55 AM Krzysztof Klimonda
> >  wrote:
> > >
> > > Hi,
> > >
> > > After reading OVN upgrade documentation[1], my understanding is that the 
> > > order of upgrading components is pretty important to ensure controlplane 
> > > & dataplane stability. As I understand those are the upgrade steps:
> >
> > >
> > > 1. upgrade and restart ovn-controller on every chassis
> > > 2. upgrade ovn-nb-db and ovn-sb-db and migrate database schema
> > > 3. upgrade ovn-northd as the last component
> >
> > Even though this is the recommended procedure,  I know that Openstack
> > tripleo deployments and Openshift upgrades the ovn-northd and
> > ovsdb-servers first
> >
> >
> > >
> > > First, is schema upgrade is done by ovn-ctl somehow? It didn't upgrade 
> > > schema for me and I had to run "ovsdb-client migrate" command on both 
> > > northbound and southbound databases.
> >
> > I think ovn-ctl should take care of upgrading the database to the
> > updated schema.  Before restarting the ovsdb-servers, the ovn packages
> > were upgraded to the desired schema files right ?
> > If so, I think ovn-ctl should upgrade the database.
>
> Yeah, those are kolla containers and after restart we use new image with new 
> ovn packages. This is how kolla starts northbound db: 
> "/usr/share/ovn/scripts/ovn-ctl run_nb_ovsdb --db-nb-addr=172.16.0.213 
> --db-nb-cluster-local-addr=172.16.0.213  --db-nb-sock=/run/ovn/ovnnb_db.sock 
> --db-nb-pid=/run/ovn/ovnnb_db.pid 
> --db-nb-file=/var/lib/openvswitch/ovn-nb/ovnnb.db 
> --ovn-nb-logfile=/var/log/kolla/openvswitch/ovn-nb-db.log" - I'll double 
> check if I can figure out why schema wasn't upgraded.


>
> >
> > >
> > > Second, in large deployments (250+ ovn-controllers) restarting ovn 
> > > southbound cluster nodes leads to complete failure of the southbound 
> > > database in my environment - once all ovn-controllers (and 
> > > neutron-ovn-metadata-agents) start reconnecting to the cluster, the load 
> > > generated by them makes cluster lose quorum, or even corrupt database on 
> > > some nodes.
> >
> > If there are a lot of connections to ovsdb-servers, it would
> > definitely slow down.   Maybe you can restart ovn-controllers in
> > phased manners ?  Or pause all ovn-controllers and then unpause them
> > in a few groups so that ovsdb-servers are not overloaded.
> > I think in one of our production scale deployments we did something similar.
>
> By pause do you mean "debug/pause"? Thanks, I'll check it out.

Yes.


>
> >
> >
> > > I'm running OVN 21.06 with ovsdb-server 2.14.0 - should I be upgrading to 
> > > 2.15.x? I've also seen the new relay-based architecture introduced in 
> > > 2.16.0 release but this seems be rather recent development and I'm 
> > > worried about stability (I've seen some report about crashes and high 
> > > memory usage).
> > >
> > > When running scale tests for ovn with kubernetes with hundreds of nodes, 
> > > how are cluster upgrades handled?
> >
> > As I mentioned above, I think in the case of openshift,  the master
> > nodes are upgraded first and then the worker nodes are upgraded.
> > I think during the master node upgrades, the worker nodes are paused.
> > My kubernetes/openshift knowledge is limited though.
>
> Thanks, any idea on upgrading ovsdb-server to 2.15.1 release? I see that 
> there is a new database format - would that give any performance boost to 
> northbound and southbound clusters? Or should I just start looking into 
> relay-based southbound deployment to scale my cluster to 200+ nodes?

If you want to try to relay deployment,  I'd suggest using 2.16.0.
I'm not really sure what improvements went in 2.15.1.  If you can, I'd
suggest moving to 2.16.0.

Thanks
Numan



>
> Thanks
> Krzysztof
>
> >
> > Thanks
> > Numan
> >
> > >
> > > Regards,
> > > Krzysztof
> > >
> > > [1] https://docs.ovn.org/en/latest/intro/install/ovn-upgrades.html
> > >
> > > --
> > >   Krzysztof Klimonda
> > >   kklimo...@syntaxhighlighted.com
> > > ___
> > > discuss mailing list
> > > disc...@openvswitch.org
> > > https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
> > >
> >
>
>
> --
>   Krzysztof Klimonda
>   kklimo...@syntaxhighlighted.com
> ___
> discuss mailing list
> disc...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
>
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] [ovn] recommended upgrade/restart procedure for ovn components

2021-08-18 Thread Krzysztof Klimonda
Hi Numan,

On Wed, Aug 18, 2021, at 17:42, Numan Siddique wrote:
> On Wed, Aug 18, 2021 at 3:55 AM Krzysztof Klimonda
>  wrote:
> >
> > Hi,
> >
> > After reading OVN upgrade documentation[1], my understanding is that the 
> > order of upgrading components is pretty important to ensure controlplane & 
> > dataplane stability. As I understand those are the upgrade steps:
> 
> >
> > 1. upgrade and restart ovn-controller on every chassis
> > 2. upgrade ovn-nb-db and ovn-sb-db and migrate database schema
> > 3. upgrade ovn-northd as the last component
> 
> Even though this is the recommended procedure,  I know that Openstack
> tripleo deployments and Openshift upgrades the ovn-northd and
> ovsdb-servers first
> 
> 
> >
> > First, is schema upgrade is done by ovn-ctl somehow? It didn't upgrade 
> > schema for me and I had to run "ovsdb-client migrate" command on both 
> > northbound and southbound databases.
> 
> I think ovn-ctl should take care of upgrading the database to the
> updated schema.  Before restarting the ovsdb-servers, the ovn packages
> were upgraded to the desired schema files right ?
> If so, I think ovn-ctl should upgrade the database.

Yeah, those are kolla containers and after restart we use new image with new 
ovn packages. This is how kolla starts northbound db: 
"/usr/share/ovn/scripts/ovn-ctl run_nb_ovsdb --db-nb-addr=172.16.0.213 
--db-nb-cluster-local-addr=172.16.0.213  --db-nb-sock=/run/ovn/ovnnb_db.sock 
--db-nb-pid=/run/ovn/ovnnb_db.pid 
--db-nb-file=/var/lib/openvswitch/ovn-nb/ovnnb.db 
--ovn-nb-logfile=/var/log/kolla/openvswitch/ovn-nb-db.log" - I'll double check 
if I can figure out why schema wasn't upgraded.

> 
> >
> > Second, in large deployments (250+ ovn-controllers) restarting ovn 
> > southbound cluster nodes leads to complete failure of the southbound 
> > database in my environment - once all ovn-controllers (and 
> > neutron-ovn-metadata-agents) start reconnecting to the cluster, the load 
> > generated by them makes cluster lose quorum, or even corrupt database on 
> > some nodes.
> 
> If there are a lot of connections to ovsdb-servers, it would
> definitely slow down.   Maybe you can restart ovn-controllers in
> phased manners ?  Or pause all ovn-controllers and then unpause them
> in a few groups so that ovsdb-servers are not overloaded.
> I think in one of our production scale deployments we did something similar.

By pause do you mean "debug/pause"? Thanks, I'll check it out.

> 
> 
> > I'm running OVN 21.06 with ovsdb-server 2.14.0 - should I be upgrading to 
> > 2.15.x? I've also seen the new relay-based architecture introduced in 
> > 2.16.0 release but this seems be rather recent development and I'm worried 
> > about stability (I've seen some report about crashes and high memory usage).
> >
> > When running scale tests for ovn with kubernetes with hundreds of nodes, 
> > how are cluster upgrades handled?
> 
> As I mentioned above, I think in the case of openshift,  the master
> nodes are upgraded first and then the worker nodes are upgraded.
> I think during the master node upgrades, the worker nodes are paused.
> My kubernetes/openshift knowledge is limited though.

Thanks, any idea on upgrading ovsdb-server to 2.15.1 release? I see that there 
is a new database format - would that give any performance boost to northbound 
and southbound clusters? Or should I just start looking into relay-based 
southbound deployment to scale my cluster to 200+ nodes?

Thanks
Krzysztof

> 
> Thanks
> Numan
> 
> >
> > Regards,
> > Krzysztof
> >
> > [1] https://docs.ovn.org/en/latest/intro/install/ovn-upgrades.html
> >
> > --
> >   Krzysztof Klimonda
> >   kklimo...@syntaxhighlighted.com
> > ___
> > discuss mailing list
> > disc...@openvswitch.org
> > https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
> >
> 


-- 
  Krzysztof Klimonda
  kklimo...@syntaxhighlighted.com
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] [ovn] recommended upgrade/restart procedure for ovn components

2021-08-18 Thread Numan Siddique
On Wed, Aug 18, 2021 at 3:55 AM Krzysztof Klimonda
 wrote:
>
> Hi,
>
> After reading OVN upgrade documentation[1], my understanding is that the 
> order of upgrading components is pretty important to ensure controlplane & 
> dataplane stability. As I understand those are the upgrade steps:

>
> 1. upgrade and restart ovn-controller on every chassis
> 2. upgrade ovn-nb-db and ovn-sb-db and migrate database schema
> 3. upgrade ovn-northd as the last component

Even though this is the recommended procedure,  I know that Openstack
tripleo deployments and Openshift upgrades the ovn-northd and
ovsdb-servers first


>
> First, is schema upgrade is done by ovn-ctl somehow? It didn't upgrade schema 
> for me and I had to run "ovsdb-client migrate" command on both northbound and 
> southbound databases.

I think ovn-ctl should take care of upgrading the database to the
updated schema.  Before restarting the ovsdb-servers, the ovn packages
were upgraded to the desired schema files right ?
If so, I think ovn-ctl should upgrade the database.


>
> Second, in large deployments (250+ ovn-controllers) restarting ovn southbound 
> cluster nodes leads to complete failure of the southbound database in my 
> environment - once all ovn-controllers (and neutron-ovn-metadata-agents) 
> start reconnecting to the cluster, the load generated by them makes cluster 
> lose quorum, or even corrupt database on some nodes.

If there are a lot of connections to ovsdb-servers, it would
definitely slow down.   Maybe you can restart ovn-controllers in
phased manners ?  Or pause all ovn-controllers and then unpause them
in a few groups so that ovsdb-servers are not overloaded.
I think in one of our production scale deployments we did something similar.


> I'm running OVN 21.06 with ovsdb-server 2.14.0 - should I be upgrading to 
> 2.15.x? I've also seen the new relay-based architecture introduced in 2.16.0 
> release but this seems be rather recent development and I'm worried about 
> stability (I've seen some report about crashes and high memory usage).
>
> When running scale tests for ovn with kubernetes with hundreds of nodes, how 
> are cluster upgrades handled?

As I mentioned above, I think in the case of openshift,  the master
nodes are upgraded first and then the worker nodes are upgraded.
I think during the master node upgrades, the worker nodes are paused.
My kubernetes/openshift knowledge is limited though.

Thanks
Numan

>
> Regards,
> Krzysztof
>
> [1] https://docs.ovn.org/en/latest/intro/install/ovn-upgrades.html
>
> --
>   Krzysztof Klimonda
>   kklimo...@syntaxhighlighted.com
> ___
> discuss mailing list
> disc...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
>
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


[ovs-discuss] [ovn] recommended upgrade/restart procedure for ovn components

2021-08-18 Thread Krzysztof Klimonda
Hi,

After reading OVN upgrade documentation[1], my understanding is that the order 
of upgrading components is pretty important to ensure controlplane & dataplane 
stability. As I understand those are the upgrade steps:

1. upgrade and restart ovn-controller on every chassis
2. upgrade ovn-nb-db and ovn-sb-db and migrate database schema
3. upgrade ovn-northd as the last component

First, is schema upgrade is done by ovn-ctl somehow? It didn't upgrade schema 
for me and I had to run "ovsdb-client migrate" command on both northbound and 
southbound databases.

Second, in large deployments (250+ ovn-controllers) restarting ovn southbound 
cluster nodes leads to complete failure of the southbound database in my 
environment - once all ovn-controllers (and neutron-ovn-metadata-agents) start 
reconnecting to the cluster, the load generated by them makes cluster lose 
quorum, or even corrupt database on some nodes.

I'm running OVN 21.06 with ovsdb-server 2.14.0 - should I be upgrading to 
2.15.x? I've also seen the new relay-based architecture introduced in 2.16.0 
release but this seems be rather recent development and I'm worried about 
stability (I've seen some report about crashes and high memory usage).

When running scale tests for ovn with kubernetes with hundreds of nodes, how 
are cluster upgrades handled?

Regards,
Krzysztof

[1] https://docs.ovn.org/en/latest/intro/install/ovn-upgrades.html

-- 
  Krzysztof Klimonda
  kklimo...@syntaxhighlighted.com
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss