On Wed, Apr 29, 2020 at 9:57 PM Dumitru Ceara <[email protected]> wrote:
> In some cases, if the NB/SB databases ovn-northd connects to are > inconsistent, ovn-northd might generate transactions that fail > continuously due to failed integrity checks on the SB database server. > > The first patch of the series addresses inconsistencies due to stale > Datapath_Binding records in the SB database. > > The second patch of the series addresses inconsistencies due to stale > tunnel_key values in various SB database table records. > > Reported-by: Dan Williams <[email protected]> > Reported-at: https://bugzilla.redhat.com/1828637 > Signed-off-by: Dumitru Ceara <[email protected]> > > Dumitru Ceara (2): > ovn-northd: Clear SB records depending on stale datapaths. > ovn-northd: Fix tunnel_key allocation for SB records. > Hi Dumitru, I did some testing in my ovn-fake-multinode setup. These are my observations. I created a logical switch sw0 with 4 logical ports. So the next tunnel key should be 5. I stopped ovn-northd and created a couple of port_binding entries manually using "ovn-sbctl create port_binding" with tunnel keys 5 and 6. I also created a logical port in sw0. Then I started ovn-northd. ovn-northd deletes the port binding entries added by me and creates the port_binding entry for the logical port with the tunnel_key=5 in the same transaction. I think ovn-northd syncs the south db based on the contents of the north db. There's no harm in having your patches. But I'm not really sure if it resolves the issue we have observed. Just to brief everyone about the issue we are seeing, we see below logs in ovn-northd. ******* 2020-04-16T23:02:33Z|00127|ovsdb_idl|WARN|transaction error: {"details":"Transaction causes multiple rows in \"Port_Binding\" table to have identical values (23eb9016-45f9-4158-be35-77b2713b9a0f and 7) for index on columns \"datapath\" and \"tunnel_key\". First row, with UUID e4f11a7b-09b6-454f-a125-34cc4b144ef6, had the following index values before the transaction: bdbb436e-f98c-4651-9b80-6e8b95044560 and 7. Second row, with UUID d37cc3f1-8633-440f-b145-8222a0d4723c, existed in the database before this transaction and was not modified by the transaction.","error":"constraint violation"} ****** And because of this constraint violation error, ovn-northd cannot further write to the sb db until it is restarted. In my opinion this can only happen if ovn-northd doesn't see the port binding row (which is actually present in the DB) in its IDL in-memory db. I suspect this could have happened when ovn-northd reconnects to the same master or connects to the new master and it doesn't get the proper updates. Maybe in this case, the IDL should request the db contents with txn id =0, so that it receives the complete dump of the db. Is it possible that ovn-northd sees a port binding with a tunnel key 'x' and still allocates the same tunnel id 'x' to a new logical port ? If so, then definitely your patches makes sense. @Han - Have you seen this issue in your deployments ? Do you have comments here ? Thanks Numan > > northd/ovn-northd.c | 57 > ++++++++++++++++++++++++++++++++++++++++----------- > 1 file changed, 45 insertions(+), 12 deletions(-) > > > _______________________________________________ > dev mailing list > [email protected] > https://mail.openvswitch.org/mailman/listinfo/ovs-dev > > _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
