On 6 Nov 2022, at 8:12, Peng He wrote:
> push_dp_ops only handles delete ops errors but ignores the modify > ops results. It's better to handle all the dp operation errors in > a consistent way. > > We observe in the production environment that sometimes a megaflow > with wrong actions keep staying in datapath. The coverage command shows > revalidators have dumped several times, however the correct > actions are not set. This implies that the ukey's action does not > equal to the meagaflow's, i.e. revalidators think the underlying > megaflow's actions are correct however they are not. > > We also check the megaflow using the ofproto/trace command, and the > actions are not matched with the ones in the actual magaflow. By > performing a revalidator/purge command, the right actions are set. > > This patch prevents the inconsistency by considering modify failure > in revalidators. > > To note, we cannot perform two state transitions and change ukey_state > into UKEY_EVICTED directly here, because, if we do so, the > sweep will remove the ukey alone and leave dp flow alive. Later, the > dump will retrieve the dp flow and might even recover it. This will > contribute the stats of this dp flow twice. > > Signed-off-by: Peng He <[email protected]> Hi Peng, Thanks for looking at the statistics part, see some comments inline! In addition, I already acked patch 2 out of this series, but it mentions patch x/3, but I do not see patch 3 in this series. Is this missing? Or are there only two patches left? Cheers, Eelco > --- > ofproto/ofproto-dpif-upcall.c | 39 ++++++++++++++++++++++------------- > 1 file changed, 25 insertions(+), 14 deletions(-) > > diff --git a/ofproto/ofproto-dpif-upcall.c b/ofproto/ofproto-dpif-upcall.c > index 7ad728adf..a7970fa9b 100644 > --- a/ofproto/ofproto-dpif-upcall.c > +++ b/ofproto/ofproto-dpif-upcall.c > @@ -254,6 +254,7 @@ enum ukey_state { > UKEY_CREATED = 0, > UKEY_VISIBLE, /* Ukey is in umap, datapath flow install is queued. > */ > UKEY_OPERATIONAL, /* Ukey is in umap, datapath flow is installed. */ > + UKEY_INCONSISTENT, /* Ukey is in umap, datapath flow is modified but > failed */ > UKEY_EVICTING, /* Ukey is in umap, datapath flow delete is queued. > */ > UKEY_EVICTED, /* Ukey is in umap, datapath flow is deleted. */ > UKEY_DELETED, /* Ukey removed from umap, ukey free is deferred. */ > @@ -1966,6 +1967,10 @@ transition_ukey_at(struct udpif_key *ukey, enum > ukey_state dst, > * UKEY_VISIBLE -> UKEY_EVICTED > * A handler attempts to install the flow, but the datapath rejects it. > * Consider that the datapath has already destroyed it. > + * UKEY_OPERATIONAL -> UKEY_INCONSISTENT > + * A revalidator modifies the flow with error returns. > + * UKEY_INCONSISTENT -> UKEY_EVICTING > + * A revalidator decides to evict the datapath flow. > * UKEY_OPERATIONAL -> UKEY_EVICTING > * A revalidator decides to evict the datapath flow. > * UKEY_EVICTING -> UKEY_EVICTED > @@ -1974,7 +1979,8 @@ transition_ukey_at(struct udpif_key *ukey, enum > ukey_state dst, > * A revalidator has removed the ukey from the umap and is deleting it. > */ > if (ukey->state == dst - 1 || (ukey->state == UKEY_VISIBLE && > - dst < UKEY_DELETED)) { > + dst < UKEY_DELETED) || > + (ukey->state == UKEY_OPERATIONAL && dst == UKEY_EVICTING)) { > ukey->state = dst; > } else { > struct ds ds = DS_EMPTY_INITIALIZER; > @@ -2416,26 +2422,31 @@ push_dp_ops(struct udpif *udpif, struct ukey_op *ops, > size_t n_ops) > > for (i = 0; i < n_ops; i++) { > struct ukey_op *op = &ops[i]; > - struct dpif_flow_stats *push, *stats, push_buf; > - > - stats = op->dop.flow_del.stats; > - push = &push_buf; > - > - if (op->dop.type != DPIF_OP_FLOW_DEL) { > - /* Only deleted flows need their stats pushed. */ > - continue; > - } > > if (op->dop.error) { > - /* flow_del error, 'stats' is unusable. */ > if (op->ukey) { > ovs_mutex_lock(&op->ukey->mutex); > - transition_ukey(op->ukey, UKEY_EVICTED); > + if (op->dop.type == DPIF_OP_FLOW_DEL) We should {} to the if branch. > + transition_ukey(op->ukey, UKEY_EVICTED); > + else { > + transition_ukey(op->ukey, UKEY_INCONSISTENT); Is there a specific reason to have delete go through a different state on failure? If not, it might be more consistent especially when debugging. > + } > ovs_mutex_unlock(&op->ukey->mutex); > } > + /* if modify or delete fails, there is no need to push stats */ Captial for “If” and ending with a dot. > continue; > } > > + if (op->dop.type != DPIF_OP_FLOW_DEL) { > + /* Only deleted flows need their stats pushed. */ > + continue; > + } > + > + struct dpif_flow_stats *push, *stats, push_buf; > + > + stats = op->dop.flow_del.stats; > + push = &push_buf; > + > if (op->ukey) { > ovs_mutex_lock(&op->ukey->mutex); > transition_ukey(op->ukey, UKEY_EVICTED); > @@ -2848,14 +2859,14 @@ revalidator_sweep__(struct revalidator *revalidator, > bool purge) > continue; > } > ukey_state = ukey->state; > - if (ukey_state == UKEY_OPERATIONAL > + if (ukey_state == UKEY_OPERATIONAL || ukey_state == > UKEY_INCONSISTENT > || (ukey_state == UKEY_VISIBLE && purge)) { > struct recirc_refs recircs = RECIRC_REFS_EMPTY_INITIALIZER; > bool seq_mismatch = (ukey->dump_seq != dump_seq > && ukey->reval_seq != reval_seq); > enum reval_result result; > > - if (purge) { > + if (purge || ukey_state == UKEY_INCONSISTENT) { > result = UKEY_DELETE; > } else if (!seq_mismatch) { > result = UKEY_KEEP; > -- > 2.25.1 > > _______________________________________________ > dev mailing list > [email protected] > https://mail.openvswitch.org/mailman/listinfo/ovs-dev _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
