On Thu, Jan 29, 2026 at 02:38:14PM +0100, Dumitru Ceara wrote: > Hi Tiago, Mairtin, > > On 1/29/26 2:24 PM, Tiago Matos Carvalho Reis via discuss wrote: > > Em qui., 29 de jan. de 2026 às 09:11, Mairtin O'Loingsigh > > <[email protected]> escreveu: > >> > >> On Wed, Jan 28, 2026 at 03:55:26PM -0300, Tiago Matos Carvalho Reis wrote: > >>> Hi everyone, > >>> > >>> I have been working on implementing incremental processing in OVN-IC and > >>> encountered a design issue regarding how OVN-IC handles multi-AZ writes. > >>> > >>> The Issue > >>> In a scenario where multiple AZs are connected via OVN-IC, certain events > >>> trigger all AZs to attempt writing the same data to the ISB/INB > >>> simultaneously. This race condition leads to a constraint violation, which > >>> causes the transaction to fail and forces a full recompute. > >>> > >>> Example: > >>> A clear example of this can be seen in ovn-ic.c:ts_run: > >>> > >>> if (ctx->ovnisb_txn) { > >>> /* Create ISB Datapath_Binding */ > >>> ICNBREC_TRANSIT_SWITCH_FOR_EACH (ts, ctx->ovninb_idl) { > >>> const struct icsbrec_datapath_binding *isb_dp = > >>> shash_find_and_delete(isb_ts_dps, ts->name); > >>> if (!isb_dp) { > >>> /* Allocate tunnel key */ > >>> int64_t dp_key = allocate_dp_key(dp_tnlids, vxlan_mode, > >>> "transit switch > >>> datapath"); > >>> if (!dp_key) { > >>> continue; > >>> } > >>> > >>> isb_dp = icsbrec_datapath_binding_insert(ctx->ovnisb_txn); > >>> icsbrec_datapath_binding_set_transit_switch(isb_dp, > >>> ts->name); > >>> icsbrec_datapath_binding_set_tunnel_key(isb_dp, dp_key); > >>> } else if (dp_key_refresh) { > >>> /* Refresh tunnel key since encap mode has changed. */ > >>> int64_t dp_key = allocate_dp_key(dp_tnlids, vxlan_mode, > >>> "transit switch > >>> datapath"); > >>> if (dp_key) { > >>> icsbrec_datapath_binding_set_tunnel_key(isb_dp, > >>> dp_key); > >>> } > >>> } > >>> > >>> if (!isb_dp->type) { > >>> icsbrec_datapath_binding_set_type(isb_dp, > >>> "transit-switch"); > >>> } > >>> > >>> if (!isb_dp->nb_ic_uuid) { > >>> icsbrec_datapath_binding_set_nb_ic_uuid(isb_dp, > >>> &ts->header_.uuid, > >>> 1); > >>> } > >>> } > >>> > >>> struct shash_node *node; > >>> SHASH_FOR_EACH (node, isb_ts_dps) { > >>> icsbrec_datapath_binding_delete(node->data); > >>> } > >>> } > >>> > >>> When a new transit-switch is created, every AZ attempts to create the same > >>> datapath_binding on the ISB. Only one request succeeds; the others fail > >>> with a "constraint-violation." > >>> > >>> Impact: > >>> This behavior negates the performance benefits of implementing incremental > >>> processing, as the system falls back to a full recompute upon these > >>> failures. > >>> > >>> For development purposes, I am currently ignoring these errors, but the > >>> ideal way of fixing this issue is to have a mechanism where only a single > >>> AZ handles the writes but this would require implementing some consensus > >>> protocol. > >>> > >>> Does anyone have any advice on how we can fix this issue? > >> ovn-ic in each AZ enumerates all existing ISB datapaths in > >> enumerate_datapaths > >> function, then will attempt to add missing datapaths. Since multilpe AZs > >> will attempt to add the same missing entry, all but the first will fail > >> causing transaction errors. Currently, ovn-ic will enumerate the ISB > >> datapath again, see the entry that succeeded and continue to create NB > >> in local AZ. This solution does cause a transaction error on all but 1 > >> AZ whenever a Transit router is added, but we currently dont have a > >> mechanism to manage this gracefully across multiple AZs. > > > > Hi Mairtin, thanks for the reply. > > > > Since there is no mechanism to manage which AZ should insert the data, > > the only good solution besides implementing a full-fledge consensus > > algorithm > > like Raft to select a leader AZ, that I came up with is to simply set an > > option > > in IC_NB_Global to manually configure a specific AZ as a leader, and in the > > code check if the AZ is the leader or not. > > > > Example: > > $ ovn-ic-nbctl set IC_NB_Global . options:leader=az1 > > > > In the code: > > > > const struct icnbrec_ic_nb_global *icnb_global = > > icnbrec_ic_nb_global_table_first(ic_nb_global_table); > > > > const struct nbrec_nb_global *nb_global = > > nbrec_nb_global_table_first(nb_global_table); > > > > const char *leader = smap_get(&icnb_global->options, "leader") > > if (!strcmp(leader, nb_global->name)) { > > // Insert logic here > > } > > > > Do you have any opinion on this approach? > > > > I was thinking of something a bit different (not too different though). > > The hierarchy is: > > IC-NB > | > ovn-ic (AZ1) ovn-ic (AZ2) ... ovn-ic (AZN) > | > IC-SB > > Conceptually this is similar to the intra-az hierarchy: > > NB > | > ovn-northd (active) ovn-northd (backup) ... ovn-northd (backup) > | > SB > > The way the instances synchronize is by taking the (single) SB database > lock. Only one northd succeeds, so that one becomes the "active". > > What if we do the same for ovn-ic? > > Make all ovn-ic try to take the IC-SB lock. Only the one that succeeds > becomes "active" and may write to the IC-SB. > > That has one implication though: the active instance (it can be any > ovn-ic in any AZ) must also make sure the IC-SB port bindings and > datapaths for other AZs are up to date. Today it only takes care of the > resources for its own AZ.
> > Each ovn-ic, both active and backup are still responsible for writing to > the per-AZ OVN NB database based on the contents of the IC-NB and IC-SB > centralized databases. > > I didn't check the code for this into too many details though so there > might be other things to consider. > > What do you think? > > Regards, > Dumitru > > >>> > >>> Thanks, > >>> Tiago Matos > >>> > >>> -- > >>> > >>> > >>> > >>> > >>> _?Esta mensagem ? direcionada apenas para os endere?os constantes no > >>> cabe?alho inicial. Se voc? n?o est? listado nos endere?os constantes no > >>> cabe?alho, pedimos-lhe que desconsidere completamente o conte?do dessa > >>> mensagem e cuja c?pia, encaminhamento e/ou execu??o das a??es citadas > >>> est?o > >>> imediatamente anuladas e proibidas?._ > >>> > >>> > >>> *?**?Apesar do Magazine Luiza tomar > >>> todas as precau??es razo?veis para assegurar que nenhum v?rus esteja > >>> presente nesse e-mail, a empresa n?o poder? aceitar a responsabilidade por > >>> quaisquer perdas ou danos causados por esse e-mail ou por seus anexos?.* > >>> > >>> > >>> > >>> -------------- next part -------------- > >>> An HTML attachment was scrubbed... > >>> URL: > >>> <http://mail.openvswitch.org/pipermail/ovs-discuss/attachments/20260128/90a7463f/attachment.htm> > >> > >> > >> Hi Tiago, > >> > >> I ran into similar issues when adding transit router support and have > >> added a comment above. I also have been working on OVN-IC related > >> features, so if you would like to discuss above issue further or other > >> OVN-IC work I would like to help. > >> > >> Regards, > >> Mairtin > >> > > > > > > Regards, > > Tiago Matos > > > Hi Dumitru, A lock similar to northd seems like a good solution, do you think serializing access to ISB might have a significant negative performance impact? Mairtin _______________________________________________ discuss mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
