Re: [ovs-dev] [PATCH] ovsdb-cs: Avoid unnecessary re-connections when updating remotes.

2021-07-15 Thread Ilya Maximets
On 7/15/21 8:31 PM, Dumitru Ceara wrote:
> On 7/15/21 7:37 PM, Han Zhou wrote:
>> On Thu, Jul 15, 2021 at 9:17 AM Ilya Maximets  wrote:
>>>
>>> On 6/29/21 9:57 PM, Ilya Maximets wrote:
>> Regarding the current patch, I think it's better to add a test case to
>> cover the scenario and confirm that existing connections didn't
>> reset. With
>> that:
>> Acked-by: Han Zhou 

 I'll work on a unit test for this.
>>>
>>> Hi.  Here is a unit test that I came up with:
>>>
>>> diff --git a/tests/ovsdb-idl.at b/tests/ovsdb-idl.at
>>> index 62181dd4d..e32f9ec89 100644
>>> --- a/tests/ovsdb-idl.at
>>> +++ b/tests/ovsdb-idl.at
>>> @@ -2282,3 +2282,27 @@ OVSDB_CHECK_CLUSTER_IDL_C([simple idl,
>> monitor_cond_since, cluster disconnect],
>>>  008: table simple: i=1 r=2 b=true s= u=<0> ia=[] ra=[] ba=[] sa=[] ua=[]
>> uuid=<2>
>>>  009: done
>>>  ]])
>>> +
>>> +dnl This test checks that IDL keeps the existing connection to the
>> server if
>>> +dnl it's still on a list of remotes after update.
>>> +OVSDB_CHECK_IDL_C([simple idl, initially empty, set remotes],
>>> +  [],
>>> +  [['set-remote unix:socket' \
>>> +'+set-remote unix:bad_socket,unix:socket' \
>>> +'+set-remote unix:bad_socket' \
>>> +'+set-remote unix:socket' \
>>> +'set-remote unix:bad_socket,unix:socket' \
>>> +'+set-remote unix:socket' \
>>> +'+reconnect']],
>>> +  [[000: empty
>>> +001: new remotes: unix:socket, is connected: true
>>> +002: new remotes: unix:bad_socket,unix:socket, is connected: true
>>> +003: new remotes: unix:bad_socket, is connected: false
>>> +004: new remotes: unix:socket, is connected: false
>>> +005: empty
>>> +006: new remotes: unix:bad_socket,unix:socket, is connected: true
>>> +007: new remotes: unix:socket, is connected: true
>>> +008: reconnect
>>> +009: empty
>>> +010: done
>>> +]])
>>> diff --git a/tests/test-ovsdb.c b/tests/test-ovsdb.c
>>> index a886f971e..93329cd4c 100644
>>> --- a/tests/test-ovsdb.c
>>> +++ b/tests/test-ovsdb.c
>>> @@ -2621,6 +2621,7 @@ do_idl(struct ovs_cmdl_context *ctx)
>>>  setvbuf(stdout, NULL, _IONBF, 0);
>>>
>>>  symtab = ovsdb_symbol_table_create();
>>> +const char remote_s[] = "set-remote ";
>>>  const char cond_s[] = "condition ";
>>>  if (ctx->argc > 2 && strstr(ctx->argv[2], cond_s)) {
>>>  update_conditions(idl, ctx->argv[2] + strlen(cond_s));
>>> @@ -2664,6 +2665,11 @@ do_idl(struct ovs_cmdl_context *ctx)
>>>  if (!strcmp(arg, "reconnect")) {
>>>  print_and_log("%03d: reconnect", step++);
>>>  ovsdb_idl_force_reconnect(idl);
>>> +}  else if (!strncmp(arg, remote_s, strlen(remote_s))) {
>>> +ovsdb_idl_set_remote(idl, arg + strlen(remote_s), true);
>>> +print_and_log("%03d: new remotes: %s, is connected: %s",
>> step++,
>>> +  arg + strlen(remote_s),
>>> +  ovsdb_idl_is_connected(idl) ? "true" :
>> "false");
>>>  }  else if (!strncmp(arg, cond_s, strlen(cond_s))) {
>>>  update_conditions(idl, arg + strlen(cond_s));
>>>  print_and_log("%03d: change conditions", step++);
>>> ---
>>>
>>> Dumitru, Han, if it looks good to you, I can squash it in before
>>> applying the patch.   What do you think?
>>
>> Thanks Ilya. LGTM.
> 
> Looks good to me too, thanks!

Applied.  Thanks!

Best regards, Ilya Maximets.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] ovsdb-cs: Avoid unnecessary re-connections when updating remotes.

2021-07-15 Thread Dumitru Ceara
On 7/15/21 7:37 PM, Han Zhou wrote:
> On Thu, Jul 15, 2021 at 9:17 AM Ilya Maximets  wrote:
>>
>> On 6/29/21 9:57 PM, Ilya Maximets wrote:
> Regarding the current patch, I think it's better to add a test case to
> cover the scenario and confirm that existing connections didn't
> reset. With
> that:
> Acked-by: Han Zhou 
>>>
>>> I'll work on a unit test for this.
>>
>> Hi.  Here is a unit test that I came up with:
>>
>> diff --git a/tests/ovsdb-idl.at b/tests/ovsdb-idl.at
>> index 62181dd4d..e32f9ec89 100644
>> --- a/tests/ovsdb-idl.at
>> +++ b/tests/ovsdb-idl.at
>> @@ -2282,3 +2282,27 @@ OVSDB_CHECK_CLUSTER_IDL_C([simple idl,
> monitor_cond_since, cluster disconnect],
>>  008: table simple: i=1 r=2 b=true s= u=<0> ia=[] ra=[] ba=[] sa=[] ua=[]
> uuid=<2>
>>  009: done
>>  ]])
>> +
>> +dnl This test checks that IDL keeps the existing connection to the
> server if
>> +dnl it's still on a list of remotes after update.
>> +OVSDB_CHECK_IDL_C([simple idl, initially empty, set remotes],
>> +  [],
>> +  [['set-remote unix:socket' \
>> +'+set-remote unix:bad_socket,unix:socket' \
>> +'+set-remote unix:bad_socket' \
>> +'+set-remote unix:socket' \
>> +'set-remote unix:bad_socket,unix:socket' \
>> +'+set-remote unix:socket' \
>> +'+reconnect']],
>> +  [[000: empty
>> +001: new remotes: unix:socket, is connected: true
>> +002: new remotes: unix:bad_socket,unix:socket, is connected: true
>> +003: new remotes: unix:bad_socket, is connected: false
>> +004: new remotes: unix:socket, is connected: false
>> +005: empty
>> +006: new remotes: unix:bad_socket,unix:socket, is connected: true
>> +007: new remotes: unix:socket, is connected: true
>> +008: reconnect
>> +009: empty
>> +010: done
>> +]])
>> diff --git a/tests/test-ovsdb.c b/tests/test-ovsdb.c
>> index a886f971e..93329cd4c 100644
>> --- a/tests/test-ovsdb.c
>> +++ b/tests/test-ovsdb.c
>> @@ -2621,6 +2621,7 @@ do_idl(struct ovs_cmdl_context *ctx)
>>  setvbuf(stdout, NULL, _IONBF, 0);
>>
>>  symtab = ovsdb_symbol_table_create();
>> +const char remote_s[] = "set-remote ";
>>  const char cond_s[] = "condition ";
>>  if (ctx->argc > 2 && strstr(ctx->argv[2], cond_s)) {
>>  update_conditions(idl, ctx->argv[2] + strlen(cond_s));
>> @@ -2664,6 +2665,11 @@ do_idl(struct ovs_cmdl_context *ctx)
>>  if (!strcmp(arg, "reconnect")) {
>>  print_and_log("%03d: reconnect", step++);
>>  ovsdb_idl_force_reconnect(idl);
>> +}  else if (!strncmp(arg, remote_s, strlen(remote_s))) {
>> +ovsdb_idl_set_remote(idl, arg + strlen(remote_s), true);
>> +print_and_log("%03d: new remotes: %s, is connected: %s",
> step++,
>> +  arg + strlen(remote_s),
>> +  ovsdb_idl_is_connected(idl) ? "true" :
> "false");
>>  }  else if (!strncmp(arg, cond_s, strlen(cond_s))) {
>>  update_conditions(idl, arg + strlen(cond_s));
>>  print_and_log("%03d: change conditions", step++);
>> ---
>>
>> Dumitru, Han, if it looks good to you, I can squash it in before
>> applying the patch.   What do you think?
> 
> Thanks Ilya. LGTM.

Looks good to me too, thanks!

> 
>>
>> Best regards, Ilya Maximets.
> 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] ovsdb-cs: Avoid unnecessary re-connections when updating remotes.

2021-07-15 Thread Han Zhou
On Thu, Jul 15, 2021 at 9:17 AM Ilya Maximets  wrote:
>
> On 6/29/21 9:57 PM, Ilya Maximets wrote:
> >>> Regarding the current patch, I think it's better to add a test case to
> >>> cover the scenario and confirm that existing connections didn't
reset. With
> >>> that:
> >>> Acked-by: Han Zhou 
> >
> > I'll work on a unit test for this.
>
> Hi.  Here is a unit test that I came up with:
>
> diff --git a/tests/ovsdb-idl.at b/tests/ovsdb-idl.at
> index 62181dd4d..e32f9ec89 100644
> --- a/tests/ovsdb-idl.at
> +++ b/tests/ovsdb-idl.at
> @@ -2282,3 +2282,27 @@ OVSDB_CHECK_CLUSTER_IDL_C([simple idl,
monitor_cond_since, cluster disconnect],
>  008: table simple: i=1 r=2 b=true s= u=<0> ia=[] ra=[] ba=[] sa=[] ua=[]
uuid=<2>
>  009: done
>  ]])
> +
> +dnl This test checks that IDL keeps the existing connection to the
server if
> +dnl it's still on a list of remotes after update.
> +OVSDB_CHECK_IDL_C([simple idl, initially empty, set remotes],
> +  [],
> +  [['set-remote unix:socket' \
> +'+set-remote unix:bad_socket,unix:socket' \
> +'+set-remote unix:bad_socket' \
> +'+set-remote unix:socket' \
> +'set-remote unix:bad_socket,unix:socket' \
> +'+set-remote unix:socket' \
> +'+reconnect']],
> +  [[000: empty
> +001: new remotes: unix:socket, is connected: true
> +002: new remotes: unix:bad_socket,unix:socket, is connected: true
> +003: new remotes: unix:bad_socket, is connected: false
> +004: new remotes: unix:socket, is connected: false
> +005: empty
> +006: new remotes: unix:bad_socket,unix:socket, is connected: true
> +007: new remotes: unix:socket, is connected: true
> +008: reconnect
> +009: empty
> +010: done
> +]])
> diff --git a/tests/test-ovsdb.c b/tests/test-ovsdb.c
> index a886f971e..93329cd4c 100644
> --- a/tests/test-ovsdb.c
> +++ b/tests/test-ovsdb.c
> @@ -2621,6 +2621,7 @@ do_idl(struct ovs_cmdl_context *ctx)
>  setvbuf(stdout, NULL, _IONBF, 0);
>
>  symtab = ovsdb_symbol_table_create();
> +const char remote_s[] = "set-remote ";
>  const char cond_s[] = "condition ";
>  if (ctx->argc > 2 && strstr(ctx->argv[2], cond_s)) {
>  update_conditions(idl, ctx->argv[2] + strlen(cond_s));
> @@ -2664,6 +2665,11 @@ do_idl(struct ovs_cmdl_context *ctx)
>  if (!strcmp(arg, "reconnect")) {
>  print_and_log("%03d: reconnect", step++);
>  ovsdb_idl_force_reconnect(idl);
> +}  else if (!strncmp(arg, remote_s, strlen(remote_s))) {
> +ovsdb_idl_set_remote(idl, arg + strlen(remote_s), true);
> +print_and_log("%03d: new remotes: %s, is connected: %s",
step++,
> +  arg + strlen(remote_s),
> +  ovsdb_idl_is_connected(idl) ? "true" :
"false");
>  }  else if (!strncmp(arg, cond_s, strlen(cond_s))) {
>  update_conditions(idl, arg + strlen(cond_s));
>  print_and_log("%03d: change conditions", step++);
> ---
>
> Dumitru, Han, if it looks good to you, I can squash it in before
> applying the patch.   What do you think?

Thanks Ilya. LGTM.

>
> Best regards, Ilya Maximets.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] ovsdb-cs: Avoid unnecessary re-connections when updating remotes.

2021-07-15 Thread Ilya Maximets
On 6/29/21 9:57 PM, Ilya Maximets wrote:
>>> Regarding the current patch, I think it's better to add a test case to
>>> cover the scenario and confirm that existing connections didn't reset. With
>>> that:
>>> Acked-by: Han Zhou 
> 
> I'll work on a unit test for this.

Hi.  Here is a unit test that I came up with:

diff --git a/tests/ovsdb-idl.at b/tests/ovsdb-idl.at
index 62181dd4d..e32f9ec89 100644
--- a/tests/ovsdb-idl.at
+++ b/tests/ovsdb-idl.at
@@ -2282,3 +2282,27 @@ OVSDB_CHECK_CLUSTER_IDL_C([simple idl, 
monitor_cond_since, cluster disconnect],
 008: table simple: i=1 r=2 b=true s= u=<0> ia=[] ra=[] ba=[] sa=[] ua=[] 
uuid=<2>
 009: done
 ]])
+
+dnl This test checks that IDL keeps the existing connection to the server if
+dnl it's still on a list of remotes after update.
+OVSDB_CHECK_IDL_C([simple idl, initially empty, set remotes],
+  [],
+  [['set-remote unix:socket' \
+'+set-remote unix:bad_socket,unix:socket' \
+'+set-remote unix:bad_socket' \
+'+set-remote unix:socket' \
+'set-remote unix:bad_socket,unix:socket' \
+'+set-remote unix:socket' \
+'+reconnect']],
+  [[000: empty
+001: new remotes: unix:socket, is connected: true
+002: new remotes: unix:bad_socket,unix:socket, is connected: true
+003: new remotes: unix:bad_socket, is connected: false
+004: new remotes: unix:socket, is connected: false
+005: empty
+006: new remotes: unix:bad_socket,unix:socket, is connected: true
+007: new remotes: unix:socket, is connected: true
+008: reconnect
+009: empty
+010: done
+]])
diff --git a/tests/test-ovsdb.c b/tests/test-ovsdb.c
index a886f971e..93329cd4c 100644
--- a/tests/test-ovsdb.c
+++ b/tests/test-ovsdb.c
@@ -2621,6 +2621,7 @@ do_idl(struct ovs_cmdl_context *ctx)
 setvbuf(stdout, NULL, _IONBF, 0);
 
 symtab = ovsdb_symbol_table_create();
+const char remote_s[] = "set-remote ";
 const char cond_s[] = "condition ";
 if (ctx->argc > 2 && strstr(ctx->argv[2], cond_s)) {
 update_conditions(idl, ctx->argv[2] + strlen(cond_s));
@@ -2664,6 +2665,11 @@ do_idl(struct ovs_cmdl_context *ctx)
 if (!strcmp(arg, "reconnect")) {
 print_and_log("%03d: reconnect", step++);
 ovsdb_idl_force_reconnect(idl);
+}  else if (!strncmp(arg, remote_s, strlen(remote_s))) {
+ovsdb_idl_set_remote(idl, arg + strlen(remote_s), true);
+print_and_log("%03d: new remotes: %s, is connected: %s", step++,
+  arg + strlen(remote_s),
+  ovsdb_idl_is_connected(idl) ? "true" : "false");
 }  else if (!strncmp(arg, cond_s, strlen(cond_s))) {
 update_conditions(idl, arg + strlen(cond_s));
 print_and_log("%03d: change conditions", step++);
---

Dumitru, Han, if it looks good to you, I can squash it in before
applying the patch.   What do you think?

Best regards, Ilya Maximets.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] ovsdb-cs: Avoid unnecessary re-connections when updating remotes.

2021-06-29 Thread Ben Pfaff
On Tue, Jun 29, 2021 at 09:57:44PM +0200, Ilya Maximets wrote:
> For the more or less automatic ways of solving the disbalance there are
> few more ideas that we can explore:
> 
> - Try to measure the load on the ovsdb-server process and report it
>   somehow in the _Server database, so the client might make a decision
>   to re-connect to a less loaded server.  This might be some metric
>   based on total number of clients or the time it takes to run a
>   single event processing loop (poll interval).

The servers could report their individual loads to each other via the
leader, and report all the server loads in _Server.  This has pitfalls
too, of course.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] ovsdb-cs: Avoid unnecessary re-connections when updating remotes.

2021-06-29 Thread Ilya Maximets
On 6/29/21 8:05 PM, Ben Pfaff wrote:
> On Tue, Jun 29, 2021 at 10:29:59AM -0700, Han Zhou wrote:
>> On Tue, Jun 29, 2021 at 8:43 AM Ben Pfaff  wrote:
>>>
>>> On Tue, Jun 29, 2021 at 12:56:18PM +0200, Ilya Maximets wrote:
 If a new database server added to the cluster, or if one of the
 database servers changed its IP address or port, then you need to
 update the list of remotes for the client.  For example, if a new
 OVN_Southbound database server is added, you need to update the
 ovn-remote for the ovn-controller.

 However, in the current implementation, the ovsdb-cs module always
 closes the current connection and creates a new one.  This can lead
 to a storm of re-connections if all ovn-controllers will be updated
 simultaneously.  They can also start re-dowloading the database
 content, creating even more load on the database servers.

 Correct this by saving an existing connection if it is still in the
 list of remotes after the update.

 'reconnect' module will report connection state updates, but that
 is OK since no real re-connection happened and we only updated the
 state of a new 'reconnect' instance.

 If required, re-connection can be forced after the update of remotes
 with ovsdb_cs_force_reconnect().
>>>
>>> I think one of the goals here was to keep the load balanced as servers
>>> are added.

Yes, I thought about that and that is a valid point.  It's more like
a trade-off here between stability of connections and trying to keep
the load balanced in some way.

>>> Maybe that's not a big deal, or maybe it would make sense to
>>> flip a coin for each of the new servers and switch over to it with
>>> probability 1/n where n is the number of servers.

That seems like an interesting approach, but I think that resulted
probability of keeping the connection would be low.

>>
>> A similar load-balancing problem exists also when a server is down and then
>> recovered. Connections will obviously move away when it is down but they
>> won't automatically connect back when it is recovered. Apart from the
>> flipping-a-coin approach suggested by Ben, I saw a proposal [0] [1] in the
>> past that provides a CLI to reconnect to a specific server which leaves
>> this burden to CMS/operators. It is not ideal but still could be an
>> alternative to solve the problem.

I remember these patches.  And I think that disbalance after one of the
servers went down and up again (e.g. temporary disconnection of one of
the cluster nodes) is a more important issue and at the same time harder
to solve, because this happens automatically without intervention from
user/CMS's side.  And at some extent it's inevitable. E.g. cluster will
almost always be disbalanced if 3 server nodes will be restarted for
upgrade one by one.  Luckily, worker nodes with ovn-controllers needs
maintenance too, so eventual load balance will be achieved.

One interesting side effect of the current patch is that you can mimic
behavior of patches [0][1] like this:
  set ovn-remote=
  set ovn-remote=
After the first command, the ovn-controller will re-connect to a new
server and it will not re-connect again after addition of all other
servers back to the list.  But I agree that this looks more like a hack
than an actual way to do that.

For the more or less automatic ways of solving the disbalance there are
few more ideas that we can explore:

- Try to measure the load on the ovsdb-server process and report it
  somehow in the _Server database, so the client might make a decision
  to re-connect to a less loaded server.  This might be some metric
  based on total number of clients or the time it takes to run a
  single event processing loop (poll interval).

- A bit more controlled way is to limit number of clients per server,
  so the server will decline connection attempts.  CMS might have an
  idea how many clients one server is able/allowed to handle.
  E.g. for N servers and M clients, it might be reasonable to allow
  not more than 2M/N connections per server to still be able to serve
  all clients if half of the servers is down.  Of course, it's up to
  CMS/user to decide on the exact number.  This could be implemented
  as an extra column for connection row in the database.

>>
>> I think both approaches have their pros and cons. The smart way doesn't
>> require human intervention in theory, but when operating at scale people
>> usually want to be cautious and have more control over the changes. For
>> example, they may want to add the server to the cluster first, and then
>> gradually move 1/n connections to the new server after a graceful period,
>> or they could be more conservative and only let the new server take new
>> connections without moving any existing connections. I'd support both
>> options and let the operators decide according to their requirements.

This sounds reasonable.

>>
>> Regarding the current patch, I think it's better to add a test 

Re: [ovs-dev] [PATCH] ovsdb-cs: Avoid unnecessary re-connections when updating remotes.

2021-06-29 Thread Ben Pfaff
On Tue, Jun 29, 2021 at 10:29:59AM -0700, Han Zhou wrote:
> On Tue, Jun 29, 2021 at 8:43 AM Ben Pfaff  wrote:
> >
> > On Tue, Jun 29, 2021 at 12:56:18PM +0200, Ilya Maximets wrote:
> > > If a new database server added to the cluster, or if one of the
> > > database servers changed its IP address or port, then you need to
> > > update the list of remotes for the client.  For example, if a new
> > > OVN_Southbound database server is added, you need to update the
> > > ovn-remote for the ovn-controller.
> > >
> > > However, in the current implementation, the ovsdb-cs module always
> > > closes the current connection and creates a new one.  This can lead
> > > to a storm of re-connections if all ovn-controllers will be updated
> > > simultaneously.  They can also start re-dowloading the database
> > > content, creating even more load on the database servers.
> > >
> > > Correct this by saving an existing connection if it is still in the
> > > list of remotes after the update.
> > >
> > > 'reconnect' module will report connection state updates, but that
> > > is OK since no real re-connection happened and we only updated the
> > > state of a new 'reconnect' instance.
> > >
> > > If required, re-connection can be forced after the update of remotes
> > > with ovsdb_cs_force_reconnect().
> >
> > I think one of the goals here was to keep the load balanced as servers
> > are added.  Maybe that's not a big deal, or maybe it would make sense to
> > flip a coin for each of the new servers and switch over to it with
> > probability 1/n where n is the number of servers.
> 
> A similar load-balancing problem exists also when a server is down and then
> recovered. Connections will obviously move away when it is down but they
> won't automatically connect back when it is recovered. Apart from the
> flipping-a-coin approach suggested by Ben, I saw a proposal [0] [1] in the
> past that provides a CLI to reconnect to a specific server which leaves
> this burden to CMS/operators. It is not ideal but still could be an
> alternative to solve the problem.
> 
> I think both approaches have their pros and cons. The smart way doesn't
> require human intervention in theory, but when operating at scale people
> usually want to be cautious and have more control over the changes. For
> example, they may want to add the server to the cluster first, and then
> gradually move 1/n connections to the new server after a graceful period,
> or they could be more conservative and only let the new server take new
> connections without moving any existing connections. I'd support both
> options and let the operators decide according to their requirements.
> 
> Regarding the current patch, I think it's better to add a test case to
> cover the scenario and confirm that existing connections didn't reset. With
> that:
> Acked-by: Han Zhou 

This seems reasonable; to be sure, I'm not arguing against Ilya's
appproach, just trying to explain my recollection of why it was done
this way.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] ovsdb-cs: Avoid unnecessary re-connections when updating remotes.

2021-06-29 Thread Han Zhou
On Tue, Jun 29, 2021 at 8:43 AM Ben Pfaff  wrote:
>
> On Tue, Jun 29, 2021 at 12:56:18PM +0200, Ilya Maximets wrote:
> > If a new database server added to the cluster, or if one of the
> > database servers changed its IP address or port, then you need to
> > update the list of remotes for the client.  For example, if a new
> > OVN_Southbound database server is added, you need to update the
> > ovn-remote for the ovn-controller.
> >
> > However, in the current implementation, the ovsdb-cs module always
> > closes the current connection and creates a new one.  This can lead
> > to a storm of re-connections if all ovn-controllers will be updated
> > simultaneously.  They can also start re-dowloading the database
> > content, creating even more load on the database servers.
> >
> > Correct this by saving an existing connection if it is still in the
> > list of remotes after the update.
> >
> > 'reconnect' module will report connection state updates, but that
> > is OK since no real re-connection happened and we only updated the
> > state of a new 'reconnect' instance.
> >
> > If required, re-connection can be forced after the update of remotes
> > with ovsdb_cs_force_reconnect().
>
> I think one of the goals here was to keep the load balanced as servers
> are added.  Maybe that's not a big deal, or maybe it would make sense to
> flip a coin for each of the new servers and switch over to it with
> probability 1/n where n is the number of servers.

A similar load-balancing problem exists also when a server is down and then
recovered. Connections will obviously move away when it is down but they
won't automatically connect back when it is recovered. Apart from the
flipping-a-coin approach suggested by Ben, I saw a proposal [0] [1] in the
past that provides a CLI to reconnect to a specific server which leaves
this burden to CMS/operators. It is not ideal but still could be an
alternative to solve the problem.

I think both approaches have their pros and cons. The smart way doesn't
require human intervention in theory, but when operating at scale people
usually want to be cautious and have more control over the changes. For
example, they may want to add the server to the cluster first, and then
gradually move 1/n connections to the new server after a graceful period,
or they could be more conservative and only let the new server take new
connections without moving any existing connections. I'd support both
options and let the operators decide according to their requirements.

Regarding the current patch, I think it's better to add a test case to
cover the scenario and confirm that existing connections didn't reset. With
that:
Acked-by: Han Zhou 

[0] https://mail.openvswitch.org/pipermail/ovs-dev/2020-August/373895.html
[1] https://mail.openvswitch.org/pipermail/ovs-dev/2020-August/374237.html
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] ovsdb-cs: Avoid unnecessary re-connections when updating remotes.

2021-06-29 Thread Ben Pfaff
On Tue, Jun 29, 2021 at 12:56:18PM +0200, Ilya Maximets wrote:
> If a new database server added to the cluster, or if one of the
> database servers changed its IP address or port, then you need to
> update the list of remotes for the client.  For example, if a new
> OVN_Southbound database server is added, you need to update the
> ovn-remote for the ovn-controller.
> 
> However, in the current implementation, the ovsdb-cs module always
> closes the current connection and creates a new one.  This can lead
> to a storm of re-connections if all ovn-controllers will be updated
> simultaneously.  They can also start re-dowloading the database
> content, creating even more load on the database servers.
> 
> Correct this by saving an existing connection if it is still in the
> list of remotes after the update.
> 
> 'reconnect' module will report connection state updates, but that
> is OK since no real re-connection happened and we only updated the
> state of a new 'reconnect' instance.
> 
> If required, re-connection can be forced after the update of remotes
> with ovsdb_cs_force_reconnect().

I think one of the goals here was to keep the load balanced as servers
are added.  Maybe that's not a big deal, or maybe it would make sense to
flip a coin for each of the new servers and switch over to it with
probability 1/n where n is the number of servers.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] ovsdb-cs: Avoid unnecessary re-connections when updating remotes.

2021-06-29 Thread Dumitru Ceara
On 6/29/21 12:56 PM, Ilya Maximets wrote:
> If a new database server added to the cluster, or if one of the
> database servers changed its IP address or port, then you need to
> update the list of remotes for the client.  For example, if a new
> OVN_Southbound database server is added, you need to update the
> ovn-remote for the ovn-controller.
> 
> However, in the current implementation, the ovsdb-cs module always
> closes the current connection and creates a new one.  This can lead
> to a storm of re-connections if all ovn-controllers will be updated
> simultaneously.  They can also start re-dowloading the database
> content, creating even more load on the database servers.
> 
> Correct this by saving an existing connection if it is still in the
> list of remotes after the update.
> 
> 'reconnect' module will report connection state updates, but that
> is OK since no real re-connection happened and we only updated the
> state of a new 'reconnect' instance.
> 
> If required, re-connection can be forced after the update of remotes
> with ovsdb_cs_force_reconnect().
> 
> Signed-off-by: Ilya Maximets 
> ---

I tried it out and it works fine; the code looks OK to me:

Acked-by: Dumitru Ceara 

Thanks,
Dumitru

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH] ovsdb-cs: Avoid unnecessary re-connections when updating remotes.

2021-06-29 Thread Ilya Maximets
If a new database server added to the cluster, or if one of the
database servers changed its IP address or port, then you need to
update the list of remotes for the client.  For example, if a new
OVN_Southbound database server is added, you need to update the
ovn-remote for the ovn-controller.

However, in the current implementation, the ovsdb-cs module always
closes the current connection and creates a new one.  This can lead
to a storm of re-connections if all ovn-controllers will be updated
simultaneously.  They can also start re-dowloading the database
content, creating even more load on the database servers.

Correct this by saving an existing connection if it is still in the
list of remotes after the update.

'reconnect' module will report connection state updates, but that
is OK since no real re-connection happened and we only updated the
state of a new 'reconnect' instance.

If required, re-connection can be forced after the update of remotes
with ovsdb_cs_force_reconnect().

Signed-off-by: Ilya Maximets 
---
 lib/jsonrpc.c  | 13 +
 lib/jsonrpc.h  |  1 +
 lib/ovsdb-cs.c | 25 +
 lib/svec.c | 11 +++
 lib/svec.h |  1 +
 5 files changed, 47 insertions(+), 4 deletions(-)

diff --git a/lib/jsonrpc.c b/lib/jsonrpc.c
index 926cbcb86..7e333ae25 100644
--- a/lib/jsonrpc.c
+++ b/lib/jsonrpc.c
@@ -952,6 +952,19 @@ jsonrpc_session_steal(struct jsonrpc_session *s)
 return rpc;
 }
 
+void
+jsonrpc_session_replace(struct jsonrpc_session *s, struct jsonrpc *rpc)
+{
+if (s->rpc) {
+jsonrpc_close(s->rpc);
+}
+s->rpc = rpc;
+if (s->rpc) {
+reconnect_set_name(s->reconnect, jsonrpc_get_name(s->rpc));
+reconnect_connected(s->reconnect, time_msec());
+}
+}
+
 static void
 jsonrpc_session_disconnect(struct jsonrpc_session *s)
 {
diff --git a/lib/jsonrpc.h b/lib/jsonrpc.h
index d75d66b86..034a30b71 100644
--- a/lib/jsonrpc.h
+++ b/lib/jsonrpc.h
@@ -114,6 +114,7 @@ struct jsonrpc_session 
*jsonrpc_session_open_unreliably(struct jsonrpc *,
 void jsonrpc_session_close(struct jsonrpc_session *);
 
 struct jsonrpc *jsonrpc_session_steal(struct jsonrpc_session *);
+void jsonrpc_session_replace(struct jsonrpc_session *, struct jsonrpc *);
 
 void jsonrpc_session_run(struct jsonrpc_session *);
 void jsonrpc_session_wait(struct jsonrpc_session *);
diff --git a/lib/ovsdb-cs.c b/lib/ovsdb-cs.c
index 911b71dd4..22630955d 100644
--- a/lib/ovsdb-cs.c
+++ b/lib/ovsdb-cs.c
@@ -660,7 +660,8 @@ ovsdb_cs_wait(struct ovsdb_cs *cs)
 
 /* Network connection. */
 
-/* Changes the remote and creates a new session.
+/* Changes the remote and creates a new session.  Keeps existing connection
+ * if current remote is still valid.
  *
  * If 'retry' is true, the connection to the remote will automatically retry
  * when it fails.  If 'retry' is false, the connection is one-time. */
@@ -670,9 +671,12 @@ ovsdb_cs_set_remote(struct ovsdb_cs *cs, const char 
*remote, bool retry)
 if (cs
 && ((remote != NULL) != (cs->remote != NULL)
 || (remote && cs->remote && strcmp(remote, cs->remote {
+struct jsonrpc *rpc = NULL;
+
 /* Close the old session, if any. */
 if (cs->session) {
-jsonrpc_session_close(cs->session);
+/* Save the current open connection and close the session. */
+rpc = jsonrpc_session_steal(cs->session);
 cs->session = NULL;
 
 free(cs->remote);
@@ -682,17 +686,30 @@ ovsdb_cs_set_remote(struct ovsdb_cs *cs, const char 
*remote, bool retry)
 /* Open new session, if any. */
 if (remote) {
 struct svec remotes = SVEC_EMPTY_INITIALIZER;
+struct uuid old_cid = cs->cid;
+
 ovsdb_session_parse_remote(remote, , >cid);
 if (cs->shuffle_remotes) {
 svec_shuffle();
 }
 cs->session = jsonrpc_session_open_multiple(, retry);
-svec_destroy();
 
-cs->state_seqno = UINT_MAX;
+/* Use old connection, if cluster id didn't change and the remote
+ * is still on the list, to avoid unnecessary re-connection. */
+if (rpc && uuid_equals(_cid, >cid)
+&& svec_contains_unsorted(, jsonrpc_get_name(rpc))) {
+jsonrpc_session_replace(cs->session, rpc);
+cs->state_seqno = jsonrpc_session_get_seqno(cs->session);
+rpc = NULL;
+} else {
+cs->state_seqno = UINT_MAX;
+}
 
+svec_destroy();
 cs->remote = xstrdup(remote);
 }
+
+jsonrpc_close(rpc);
 }
 }
 
diff --git a/lib/svec.c b/lib/svec.c
index c1b986bab..6ea96384b 100644
--- a/lib/svec.c
+++ b/lib/svec.c
@@ -247,6 +247,17 @@ svec_contains(const struct svec *svec, const char *name)
 return svec_find(svec, name) != SIZE_MAX;
 }
 
+bool
+svec_contains_unsorted(const struct svec *svec, const char *name)
+{
+for