On Wed, Jan 28, 2026 at 03:55:26PM -0300, Tiago Matos Carvalho Reis wrote:
> Hi everyone,
> 
> I have been working on implementing incremental processing in OVN-IC and
> encountered a design issue regarding how OVN-IC handles multi-AZ writes.
> 
> The Issue
> In a scenario where multiple AZs are connected via OVN-IC, certain events
> trigger all AZs to attempt writing the same data to the ISB/INB
> simultaneously. This race condition leads to a constraint violation, which
> causes the transaction to fail and forces a full recompute.
> 
> Example:
> A clear example of this can be seen in ovn-ic.c:ts_run:
> 
>     if (ctx->ovnisb_txn) {
>         /* Create ISB Datapath_Binding */
>         ICNBREC_TRANSIT_SWITCH_FOR_EACH (ts, ctx->ovninb_idl) {
>             const struct icsbrec_datapath_binding *isb_dp =
>                 shash_find_and_delete(isb_ts_dps, ts->name);
>             if (!isb_dp) {
>                 /* Allocate tunnel key */
>                 int64_t dp_key = allocate_dp_key(dp_tnlids, vxlan_mode,
>                                                  "transit switch datapath");
>                 if (!dp_key) {
>                     continue;
>                 }
> 
>                 isb_dp = icsbrec_datapath_binding_insert(ctx->ovnisb_txn);
>                 icsbrec_datapath_binding_set_transit_switch(isb_dp,
> ts->name);
>                 icsbrec_datapath_binding_set_tunnel_key(isb_dp, dp_key);
>             } else if (dp_key_refresh) {
>                 /* Refresh tunnel key since encap mode has changed. */
>                 int64_t dp_key = allocate_dp_key(dp_tnlids, vxlan_mode,
>                                                  "transit switch datapath");
>                 if (dp_key) {
>                     icsbrec_datapath_binding_set_tunnel_key(isb_dp, dp_key);
>                 }
>             }
> 
>             if (!isb_dp->type) {
>                 icsbrec_datapath_binding_set_type(isb_dp, "transit-switch");
>             }
> 
>             if (!isb_dp->nb_ic_uuid) {
>                 icsbrec_datapath_binding_set_nb_ic_uuid(isb_dp,
>                                                         &ts->header_.uuid,
> 1);
>             }
>         }
> 
>         struct shash_node *node;
>         SHASH_FOR_EACH (node, isb_ts_dps) {
>             icsbrec_datapath_binding_delete(node->data);
>         }
>     }
> 
> When a new transit-switch is created, every AZ attempts to create the same
> datapath_binding on the ISB. Only one request succeeds; the others fail
> with a "constraint-violation."
> 
> Impact:
> This behavior negates the performance benefits of implementing incremental
> processing, as the system falls back to a full recompute upon these
> failures.
> 
> For development purposes, I am currently ignoring these errors, but the
> ideal way of fixing this issue is to have a mechanism where only a single
> AZ handles the writes but this would require implementing some consensus
> protocol.
> 
> Does anyone have any advice on how we can fix this issue?
ovn-ic in each AZ enumerates all existing ISB datapaths in
enumerate_datapaths
function, then will attempt to add missing datapaths. Since multilpe AZs
will attempt to add the same missing entry, all but the first will fail
causing transaction errors. Currently, ovn-ic will enumerate the ISB
datapath again, see the entry that succeeded and continue to create NB
in local AZ. This solution does cause a transaction error on all but 1
AZ whenever a Transit router is added, but we currently dont have a
mechanism to manage this gracefully across multiple AZs. 
> 
> Thanks,
> Tiago Matos
> 
> -- 
> 
> 
> 
> 
> _?Esta mensagem ? direcionada apenas para os endere?os constantes no 
> cabe?alho inicial. Se voc? n?o est? listado nos endere?os constantes no 
> cabe?alho, pedimos-lhe que desconsidere completamente o conte?do dessa 
> mensagem e cuja c?pia, encaminhamento e/ou execu??o das a??es citadas est?o 
> imediatamente anuladas e proibidas?._
> 
> 
> *?**?Apesar do Magazine Luiza tomar 
> todas as precau??es razo?veis para assegurar que nenhum v?rus esteja 
> presente nesse e-mail, a empresa n?o poder? aceitar a responsabilidade por 
> quaisquer perdas ou danos causados por esse e-mail ou por seus anexos?.*
> 
> 
> 
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: 
> <http://mail.openvswitch.org/pipermail/ovs-discuss/attachments/20260128/90a7463f/attachment.htm>


Hi Tiago,

I ran into similar issues when adding transit router support and have
added a comment above. I also have been working on OVN-IC related
features, so if you would like to discuss above issue further or other
OVN-IC work I would like to help.

Regards,
Mairtin

_______________________________________________
discuss mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss

Reply via email to