Re: [ovs-discuss] Possible bug with monitor_cond_since and fast re-sync of ovn-controllers

2023-09-21 Thread Oleksandr Mykhalskyi via discuss
Hi Ilya,

Thanks for quick replay and directions.

"Do you mean a RAFT leader change/transfer?  I'm assuming so, because you 
mention disabling of leadership transfer on compaction"

Yes, I am talking about RAFT leader change/transfer.

"I don't understand why that is supposed to help, because clients should not 
disconnect on leadership change in the first place.  All the SbDB clients, 
except for ovn-northd, should not require leader-only connections. Could you, 
please, check why is this happening?"

Yes, you are right, I totally forget about our custom connection setting in SB 
DB, which forcing client to use leader node only:
  ovn-sbctl set-connection ptcp:6642:
  ovn-sbctl set connection . inactivity_probe=6
Also we have custom daemon, which periodically check if RAFT leader was changed 
and set address of new RAFT leader in connection string
It was realized in this manner some time ago, because "inactivity_probe=6" 
didn`t work with simple setting "ovn-sbctl set-connection ptcp:6642", clients 
with long-running tasks still got disconnects from SB DB after default 5sec 
probe interval.
Ok, it seems I need to review and change this approach.

"It may be a possibility.  Not making it configurable, but just increasing it 
to a much higher value (it's capped by the size of the database itself, so 
should not grow too large)."

It will be great, if so.
I am still thinking, that transaction history of 100 records can cause issues 
with fast re-sync in big clouds, where a lot of clients fulfill this limit very 
quickly by transactions below (I can consider them as heartbeats):

{"eid":"75718296-7ff5-4d6d-9fe8-129fe4ea2075","data":[null,{"Chassis_Private":{"8a94ede6-ee86-443f-a861-19320b500767":{"nb_cfg":1078267,"nb_cfg_timestamp":1695208789023}},"_comment":"ovn-controller","_date":1695208789142,"_is_diff":true}],"term":9866,"index":8583308}
{"eid":"aefff81c-76ae-4978-ac38-c129ed1a1009","data":[null,{"Chassis_Private":{"94e607a0-9e1a-4ca4-9e26-b1cbdcdf362a":{"external_ids":["map",[["neutron:ovn-metadata-sb-cfg","1078278"]]]}},"_date":1695209448827,"_is_diff":true}],"term":9866,"index":8583427}


Thanks!

Regards, Oleksandr


-Original Message-
From: Ilya Maximets 
Sent: Wednesday, September 20, 2023 11:34 PM
To: Oleksandr Mykhalskyi ; 
'b...@openvswitch.org' 
Cc: Alexander Povaliukhin ; 
i.maxim...@ovn.org
Subject: Re: [ovs-discuss] Possible bug with monitor_cond_since and fast 
re-sync of ovn-controllers

[External Email]




On 9/20/23 20:39, Oleksandr Mykhalskyi via discuss wrote:
> Dear OVS developers,
>
>
>
> After recent update of our openstack cloud from xena to yoga (OVS 2.17 and 
> OVN 22.03), we faced critical issue related to reconnection of 
> ovn-controllers to OVN SB DB.
>
> From 2.17, python-ovs supports monitor_cond_since to fast re-sync of 
> ovn-controllers during reconnection to Southbound DB.
>
> Method monitor_cond_since looks at server`s transaction history to find 
> client`s last-txn-id.
>
> Transaction history limit is hardcoded with 100 records - 
> https://github.com/openvswitch/ovs/blob/master/ovsdb/transaction.c#L1666C39-L1666C39
>  
> <https://github.com/openvswitch/ovs/blob/master/ovsdb/transaction.c#L1666C39-L1666C39>,
>  and its root cause of our issues.
>
>
>
> For neutron with OVN, we have at least 2 transactions for each hypervisor per 
> 60 sec, ovn-controller and ovn-metadata-agent send heartbeats in this way, 
> updating timestamp for corresponding records in Chassis_Private table, 
> something like:
>
> / 
> {"eid":"75718296-7ff5-4d6d-9fe8-129fe4ea2075","data":[null,{"Chassis_Private":{"8a94ede6-ee86-443f-a861-19320b500767":{"nb_cfg":1078267,"nb_cfg_timestamp":1695208789023}},"_comment":"ovn-controller","_date":1695208789142,"_is_diff":true}],"term":9866,"index":8583308}/
>
> / 
> {"eid":"aefff81c-76ae-4978-ac38-c129ed1a1009","data":[null,{"Chassis_Private":{"94e607a0-9e1a-4ca4-9e26-b1cbdcdf362a":{"external_ids":["map",[["neutron:ovn-metadata-sb-cfg","1078278"]]]}},"_date":1695209448827,"_is_diff":true}],"term":9866,"index":8583427}/
>
> For our cloud with 250 hypervisors, we have at least 500 such
> heartbeat transactions per 60 sec
>
>
>
> Now look at next case - SB DB master moved to other node, agen

Re: [ovs-discuss] Possible bug with monitor_cond_since and fast re-sync of ovn-controllers

2023-09-20 Thread Ilya Maximets via discuss
On 9/20/23 20:39, Oleksandr Mykhalskyi via discuss wrote:
> Dear OVS developers,
> 
>  
> 
> After recent update of our openstack cloud from xena to yoga (OVS 2.17 and 
> OVN 22.03), we faced critical issue related to reconnection of 
> ovn-controllers to OVN SB DB.
> 
> From 2.17, python-ovs supports monitor_cond_since to fast re-sync of 
> ovn-controllers during reconnection to Southbound DB.
> 
> Method monitor_cond_since looks at server`s transaction history to find 
> client`s last-txn-id.
> 
> Transaction history limit is hardcoded with 100 records - 
> https://github.com/openvswitch/ovs/blob/master/ovsdb/transaction.c#L1666C39-L1666C39
>  
> ,
>  and its root cause of our issues.
> 
>  
> 
> For neutron with OVN, we have at least 2 transactions for each hypervisor per 
> 60 sec, ovn-controller and ovn-metadata-agent send heartbeats in this way, 
> updating timestamp for corresponding records in Chassis_Private table, 
> something like:
> 
> / 
> {"eid":"75718296-7ff5-4d6d-9fe8-129fe4ea2075","data":[null,{"Chassis_Private":{"8a94ede6-ee86-443f-a861-19320b500767":{"nb_cfg":1078267,"nb_cfg_timestamp":1695208789023}},"_comment":"ovn-controller","_date":1695208789142,"_is_diff":true}],"term":9866,"index":8583308}/
> 
> / 
> {"eid":"aefff81c-76ae-4978-ac38-c129ed1a1009","data":[null,{"Chassis_Private":{"94e607a0-9e1a-4ca4-9e26-b1cbdcdf362a":{"external_ids":["map",[["neutron:ovn-metadata-sb-cfg","1078278"]]]}},"_date":1695209448827,"_is_diff":true}],"term":9866,"index":8583427}/
> 
> For our cloud with 250 hypervisors, we have at least 500 such heartbeat 
> transactions per 60 sec
> 
>  
> 
> Now look at next case - SB DB master moved to other node, agents of 250 
> hypervisors try to reconnect to SB DB nodes while searching master one.

Do you mean a RAFT leader change/transfer?  I'm assuming so, because
you mention disabling of leadership transfer on compaction.

But neither ovn-controller nor neutron agents are leader-only.
Why do they re-connect on leadership change?  Sounds strange.

> 
> First ~50 hypervisors reconnected and did fast re-sync, sent heartbeats and 
> filled transaction history by 100 records in this way.
> 
> All other hypervisors are not able to do fast re-sync anymore, they try to 
> download whole SB contents and hung OVSDB completely.
> 
>  
> 
> Its not a theoretical case, we observed such behavior couple of times after 
> update.
> 
> To stabilize this situation, we had to stop all ovn-controllers  and start 
> them one by one.

It's not necessary to start them one by one.  New connections
are processed one by one on a server side, so already connected
instances should not disconnect again (ovsdb-server should be
able to reply to inactivity probes for them in time).
But they should not really re-connect on leadership change in
the first place.

> 
> Here is examples on small test cloud:
> 
> */  Reconnection of ovn-contoller, when less than 100 tnx executed/*
> 
> /   2023-09-20T11:45:43.629Z|12099|jsonrpc|DBG|tcp:172.31.23.11:6642: 
> send request, method="*monitor_cond_since*", 
> params=["OVN_Southbound",...,"*8f7caad3-c20c-4563-ae20-8eaaf0f22287*"], 
> id=40940/
> 
> /  2023-09-20T11:45:43.635Z|12101|jsonrpc|DBG|tcp:172.31.23.11:6642: 
> received reply, 
> result=[*true*,"*1a70474d-c437-4af1-ac2e-4a6261c5ca70*",{"SB_Global"], 
> id=40940/
> 
> /   /
> 
> /  
> {"eid":"*8f7caad3-c20c-4563-ae20-8eaaf0f22287*","data":[null,{"Chassis_Private":{"f2d5a60d-03ff-4d60-ad6c-36c30e7f8546":{"nb_cfg":1078289,"nb_cfg_timestamp":1695210108374}},"_comment":"ovn-controller","_date":1695210108457,"_is_diff":true}],"term":9866,"index":*8583570*}/
> 
> / 
> {"eid":"*1a70474d-c437-4af1-ac2e-4a6261c5ca70*","data":[null,{"Chassis_Private":{"f2d5a60d-03ff-4d60-ad6c-36c30e7f8546":{"external_ids":["map",[["neutron:ovn-metadata-sb-cfg","1078292"]]]}},"_date":1695210343461,"_is_diff":true}],"term":9866,"index":*8583603*}/
> 
> / /
> 
> /   *Reconnection of ovn-contoller, when more than 100 tnx executed*/
> 
> / 2023-09-20T11:31:24.300Z|11888|jsonrpc|DBG|tcp:172.31.23.11:6642: send 
> request, method="*monitor_cond_since*", params=["OVN_Southbound",..., 
> "*75718296-7ff5-4d6d-9fe8-129fe4ea2075*"], id=40891/
> 
> / 2023-09-20T11:31:24.840Z|11890|jsonrpc|DBG|tcp:172.31.23.11:6642: 
> received reply, 
> result=[*false*,"*aefff81c-76ae-4978-ac38-c129ed1a1009*",{"RBAC_Role], 
> id=40891/
> 
> / ### downloading of full SB DB contents…./
> 
> / /
> 
> / 
> {"eid":"*75718296-7ff5-4d6d-9fe8-129fe4ea2075*","data":[null,{"Chassis_Private":{"8a94ede6-ee86-443f-a861-19320b500767":{"nb_cfg":1078267,"nb_cfg_timestamp":1695208789023}},"_comment":"ovn-controller","_date":1695208789142,"_is_diff":true}],"term":9866,"index":*8583308*}/
> 
> / 
> {"eid":"*aefff81c-76ae-4978-ac38-c129ed1a1009*","data":[null,{"Chassis_Private":{"94e607a0-9e1a-4ca4-9e26-b1cbdcdf362a":{"external_ids":["map",

[ovs-discuss] Possible bug with monitor_cond_since and fast re-sync of ovn-controllers

2023-09-20 Thread Oleksandr Mykhalskyi via discuss
Dear OVS developers,

After recent update of our openstack cloud from xena to yoga (OVS 2.17 and OVN 
22.03), we faced critical issue related to reconnection of ovn-controllers to 
OVN SB DB.
>From 2.17, python-ovs supports monitor_cond_since to fast re-sync of 
>ovn-controllers during reconnection to Southbound DB.
Method monitor_cond_since looks at server`s transaction history to find 
client`s last-txn-id.
Transaction history limit is hardcoded with 100 records - 
https://github.com/openvswitch/ovs/blob/master/ovsdb/transaction.c#L1666C39-L1666C39,
 and its root cause of our issues.

For neutron with OVN, we have at least 2 transactions for each hypervisor per 
60 sec, ovn-controller and ovn-metadata-agent send heartbeats in this way, 
updating timestamp for corresponding records in Chassis_Private table, 
something like:
 
{"eid":"75718296-7ff5-4d6d-9fe8-129fe4ea2075","data":[null,{"Chassis_Private":{"8a94ede6-ee86-443f-a861-19320b500767":{"nb_cfg":1078267,"nb_cfg_timestamp":1695208789023}},"_comment":"ovn-controller","_date":1695208789142,"_is_diff":true}],"term":9866,"index":8583308}
 
{"eid":"aefff81c-76ae-4978-ac38-c129ed1a1009","data":[null,{"Chassis_Private":{"94e607a0-9e1a-4ca4-9e26-b1cbdcdf362a":{"external_ids":["map",[["neutron:ovn-metadata-sb-cfg","1078278"]]]}},"_date":1695209448827,"_is_diff":true}],"term":9866,"index":8583427}
For our cloud with 250 hypervisors, we have at least 500 such heartbeat 
transactions per 60 sec

Now look at next case - SB DB master moved to other node, agents of 250 
hypervisors try to reconnect to SB DB nodes while searching master one.
First ~50 hypervisors reconnected and did fast re-sync, sent heartbeats and 
filled transaction history by 100 records in this way.
All other hypervisors are not able to do fast re-sync anymore, they try to 
download whole SB contents and hung OVSDB completely.

Its not a theoretical case, we observed such behavior couple of times after 
update.
To stabilize this situation, we had to stop all ovn-controllers  and start them 
one by one.

Here is examples on small test cloud:
  Reconnection of ovn-contoller, when less than 100 tnx executed
   2023-09-20T11:45:43.629Z|12099|jsonrpc|DBG|tcp:172.31.23.11:6642: send 
request, method="monitor_cond_since", 
params=["OVN_Southbound",...,"8f7caad3-c20c-4563-ae20-8eaaf0f22287"], id=40940
  2023-09-20T11:45:43.635Z|12101|jsonrpc|DBG|tcp:172.31.23.11:6642: 
received reply, 
result=[true,"1a70474d-c437-4af1-ac2e-4a6261c5ca70",{"SB_Global"], id=40940

  
{"eid":"8f7caad3-c20c-4563-ae20-8eaaf0f22287","data":[null,{"Chassis_Private":{"f2d5a60d-03ff-4d60-ad6c-36c30e7f8546":{"nb_cfg":1078289,"nb_cfg_timestamp":1695210108374}},"_comment":"ovn-controller","_date":1695210108457,"_is_diff":true}],"term":9866,"index":8583570}
 
{"eid":"1a70474d-c437-4af1-ac2e-4a6261c5ca70","data":[null,{"Chassis_Private":{"f2d5a60d-03ff-4d60-ad6c-36c30e7f8546":{"external_ids":["map",[["neutron:ovn-metadata-sb-cfg","1078292"]]]}},"_date":1695210343461,"_is_diff":true}],"term":9866,"index":8583603}

   Reconnection of ovn-contoller, when more than 100 tnx executed
 2023-09-20T11:31:24.300Z|11888|jsonrpc|DBG|tcp:172.31.23.11:6642: send 
request, method="monitor_cond_since", params=["OVN_Southbound",..., 
"75718296-7ff5-4d6d-9fe8-129fe4ea2075"], id=40891
 2023-09-20T11:31:24.840Z|11890|jsonrpc|DBG|tcp:172.31.23.11:6642: received 
reply, result=[false,"aefff81c-76ae-4978-ac38-c129ed1a1009",{"RBAC_Role], 
id=40891
 ### downloading of full SB DB contents

 
{"eid":"75718296-7ff5-4d6d-9fe8-129fe4ea2075","data":[null,{"Chassis_Private":{"8a94ede6-ee86-443f-a861-19320b500767":{"nb_cfg":1078267,"nb_cfg_timestamp":1695208789023}},"_comment":"ovn-controller","_date":1695208789142,"_is_diff":true}],"term":9866,"index":8583308}
 
{"eid":"aefff81c-76ae-4978-ac38-c129ed1a1009","data":[null,{"Chassis_Private":{"94e607a0-9e1a-4ca4-9e26-b1cbdcdf362a":{"external_ids":["map",[["neutron:ovn-metadata-sb-cfg","1078278"]]]}},"_date":1695209448827,"_is_diff":true}],"term":9866,"index":8583427}

Before yoga, on kolla images with OVS 2.16, reconnect worked well.

In long-term perspective, I can propose to make transaction history limit as 
configurable parameter.

In short-term perspective, all we can do now - avoid transferring of leadership 
(when doing snapshot) doing manual DB compaction time to time, whithin 24 hours.

Kindle ask you to review and comment this information.

Thank you!


Oleksandr Mykhalskyi, System Engineer
Netcracker Technology




The information transmitted herein is intended only for the person or entity to 
which it is addressed and may contain confidential, proprietary and/or 
privileged material. Any review, retransmission, dissemination or other use of, 
or taking of any action in reliance upon, this information by persons or 
entities other than the intended recipient is prohibited. If you received this 
in error, please contact