Eelco Chaudron <[email protected]> 于2022年12月16日周五 23:00写道:
> > > On 16 Dec 2022, at 8:56, Peng He wrote: > > > From: Peng He <[email protected]> > > To: Eelco Chaudron <[email protected]> > > Cc: Ilya Maximets <[email protected]>, [email protected] > > Subject: Re: [ovs-dev v7 1/3] ofproto-dpif-upcall: fix push_dp_ops > > Date: Fri, 16 Dec 2022 15:56:32 +0800 > > > > Eelco Chaudron <[email protected]> 于2022年12月13日周二 20:36写道: > > > >> > >> > >> On 10 Dec 2022, at 1:37, Peng He wrote: > >> > >>> Patch v5 has statistics issues. > >>> > >>> In order to solve this issue, we had a discussion. > >>> > >>> below is the quote of the email. > >>> > >>> ” > >>> After a second thought, I think maybe keeping INCONSISTENT just for the > >>> modify error is a better option. > >>> > >>> With current patch: > >>> 1. > >>> the modify error case: > >>> OPERATIONAL -> INCONSISTENT -> EVICTING -> EVICTED > >>> 2. > >>> the delete error case: > >>> EVICTING -> EVICTED > >>> > >>> Change both to INCONSISTENT: > >>> > >>> the modify error case: > >>> did not change. > >>> > >>> the delete error case: > >>> EVICTING -> INCONSISTENT -> EVICTED? > >>> > >>> “ > >>> > >>> And we agree to take the second solution. > >> > >> I know, but going over the state meanings again, UKEY_EVICTING means the > >> following: > >> > >> /* Ukey is in umap, datapath flow delete is queued. */ > >> > >> Which now no longer is the case, so should a new state not make more > sense? > >> > > > > Why it's no longer valid? > > > > In the patch, only modify failed ukey will be set to EVICTING, is it just > > right fit the meaning of > > EVICTING? (ukey in the umap, but delete operation is queued?) > > But it’s not as the delete operation is not queued, that is done in the > revalidator_sweep__() part. > Understand now. > > >> > >> Any one else has some input on this?? > >> > >>> Eelco Chaudron <[email protected]> 于2022年12月8日周四 18:54写道: > >>> > >>>> > >>>> > >>>> On 27 Nov 2022, at 8:28, Peng He wrote: > >>>> > >>>>> push_dp_ops only handles delete ops errors but ignores the modify > >>>>> ops results. It's better to handle all the dp operation errors in > >>>>> a consistent way. > >>>>> > >>>>> We observe in the production environment that sometimes a megaflow > >>>>> with wrong actions keep staying in datapath. The coverage command > shows > >>>>> revalidators have dumped several times, however the correct > >>>>> actions are not set. This implies that the ukey's action does not > >>>>> equal to the meagaflow's, i.e. revalidators think the underlying > >>>>> megaflow's actions are correct however they are not. > >>>>> > >>>>> We also check the megaflow using the ofproto/trace command, and the > >>>>> actions are not matched with the ones in the actual magaflow. By > >>>>> performing a revalidator/purge command, the right actions are set. > >>>>> > >>>>> This patch prevents the inconsistency by considering modify failure > >>>>> in revalidators. > >>>>> > >>>>> To note, we cannot perform two state transitions and change > ukey_state > >>>>> into UKEY_EVICTED directly here, because, if we do so, the > >>>>> sweep will remove the ukey alone and leave dp flow alive. Later, the > >>>>> dump will retrieve the dp flow and might even recover it. This will > >>>>> contribute the stats of this dp flow twice. > >>>>> > >>>>> Signed-off-by: Peng He <[email protected]> > >>>>> --- > >>>>> ofproto/ofproto-dpif-upcall.c | 34 > +++++++++++++++++++++++----------- > >>>>> 1 file changed, 23 insertions(+), 11 deletions(-) > >>>>> > >>>>> diff --git a/ofproto/ofproto-dpif-upcall.c > >>>> b/ofproto/ofproto-dpif-upcall.c > >>>>> index 7ad728adf..c2cefbeb8 100644 > >>>>> --- a/ofproto/ofproto-dpif-upcall.c > >>>>> +++ b/ofproto/ofproto-dpif-upcall.c > >>>>> @@ -2416,26 +2416,30 @@ push_dp_ops(struct udpif *udpif, struct > ukey_op > >>>> *ops, size_t n_ops) > >>>>> > >>>>> for (i = 0; i < n_ops; i++) { > >>>>> struct ukey_op *op = &ops[i]; > >>>>> - struct dpif_flow_stats *push, *stats, push_buf; > >>>>> - > >>>>> - stats = op->dop.flow_del.stats; > >>>>> - push = &push_buf; > >>>>> - > >>>>> - if (op->dop.type != DPIF_OP_FLOW_DEL) { > >>>>> - /* Only deleted flows need their stats pushed. */ > >>>>> - continue; > >>>>> - } > >>>>> > >>>>> if (op->dop.error) { > >>>>> - /* flow_del error, 'stats' is unusable. */ > >>>>> if (op->ukey) { > >>>>> ovs_mutex_lock(&op->ukey->mutex); > >>>>> - transition_ukey(op->ukey, UKEY_EVICTED); > >>>>> + if (op->dop.type == DPIF_OP_FLOW_DEL) { > >>>>> + transition_ukey(op->ukey, UKEY_EVICTED); > >>>>> + } else { > >> > >> I think we could use a comment here to make sure why we set it to > >> evicting. Maybe just a reference to the comment in revalidator_sweep__() > >> might be enough. > >> > >>>>> + transition_ukey(op->ukey, UKEY_EVICTING); > >>>>> + } > >>>>> ovs_mutex_unlock(&op->ukey->mutex); > >>>>> } > >>>>> continue; > >>>>> } > >>>>> > >>>>> + if (op->dop.type != DPIF_OP_FLOW_DEL) { > >>>>> + /* Only deleted flows need their stats pushed. */ > >>>>> + continue; > >>>>> + } > >>>>> + > >>>>> + struct dpif_flow_stats *push, *stats, push_buf; > >>>>> + > >>>>> + stats = op->dop.flow_del.stats; > >>>>> + push = &push_buf; > >>>>> + > >>>>> if (op->ukey) { > >>>>> ovs_mutex_lock(&op->ukey->mutex); > >>>>> transition_ukey(op->ukey, UKEY_EVICTED); > >>>>> @@ -2848,6 +2852,14 @@ revalidator_sweep__(struct revalidator > >>>> *revalidator, bool purge) > >>>>> continue; > >>>>> } > >>>>> ukey_state = ukey->state; > >>>>> + > >>>>> + if (ukey_state == UKEY_EVICTING) { > >>>>> + /* previous modify operation fails on this ukey and > >>>> ukey_state > >>>>> + * is set to UKEY_EVICTING, issue a delete operation > >> on > >>>> this > >>>>> + * ukey. > >>>>> + */ > >>>>> + delete_op_init(udpif, &ops[n_ops++], ukey); > >> > >> This would cause the later push_ukey_ops() to call ukey_delete() which > >> will call transition_ukey(ukey, UKEY_DELETED). > >> Which I think will result in an invalid state transaction from > >> UKEY_EVICTING to UKEY_DELETED. > >> > > > > No, push_ukey_ops calls push_dp_ops, and push_dp_ops will delete the > > datapath key first and then set the state to UKEY_EVICTED. > > ACK, you are right. Missed the transition in the successful case as we > have a DEL operation. > > >>>> > >>>> How can we be sure this state here is only for failing to modify > >>>> operations? > >>>> This state is set in other places in the code also. > >>>> > >>> > >>> because revalidate_sweep is called after revaldiate. all the datapath > ops > >>> have already been > >>> pushed, and if there are ukeys at UKEY_EVICTING, it's because > previously > >>> datapath ops > >>> have errors. and if del ops fail, ukey is deleted. so only ukeys at > >>> UKEY_EVICTING must > >>> come from modify fail. > >> > >> First of all, I missed that p_purge_cb() was pausing the revalidators, > so > >> you are right that there is no other way to set UKEY_EVICTING other than > >> the sweet phase. > >> > >> I also missed the part that the ukey_delete() will do the actual > removal, > >> and not the if (ukey_state == UKEY_EVICTED) statement lower. > >> > >> Maybe we should reconsider having a special state? Anyhow, I do think > the > >> comment in the code needs some changing to make this clear: > >> > > > > Yes the comments are clearer, and I can add them in the next version. > > > > > > > >> > >> > >> if (ukey_state == UKEY_EVICTING) { > >> /* A previous modify operation in push_dp_ops() failed > and > >> * caused the state to be UKEY_EVICTING. We need to > >> manually > >> * push a delete operation on this ukey to get both the > >> * datapath flow and ukey removed. The later > >> push_ukey_ops() > >> * will call push_dp_ops() and ukey_delete() to > accomplish > >> * this. */ > >> delete_op_init(udpif, &ops[n_ops++], ukey); > >> } > >> > >>>> Maybe we should keep the INCONSISTENT state to avoid this (see v5)? > >>>> > >> > >> What about the case the push_dp_ops()/push_ukey_ops() fails in > >> revalidator_sweep__() for a none DELETE action, who will take care of > this? > >> The alternative fix might be better in this case. > >> > > > > DELETE failure in the original code is taken cared. It will go to EVICTED > > directly. > > This leaves a possibility that a key is in the dp but not in the ukey, > > which is fine, because we have the mecahnism already: > > > > check ukey_create_from_dpif_flow and ukey_acquire fail path. > > I meant to say if we do a modify in revalidator_sweep__() > > If we take the following branch, > > if (ukey_state == UKEY_OPERATIONAL > || (ukey_state == UKEY_VISIBLE && purge)) { > > and then revalidate_ukey() gets called and it returns UKEY_MODIFY which > calls push_dp_ops() and now assume this modify fails… > > This will leaf the ukey in UKEY_EVICTING state, as there is no additional > revalidator_sweep__() call. > > This will lead to the below in the next revalidator() call: > > VLOG_INFO("Unexpected ukey transition from state %d " > "(last transitioned from thread %u at %s)", > > Or am I missing something? > Yes, it's correct. I miss that part. the problem of modify fail is that it needs to initial an extra delete operation, thus, it relies always a next around process. So we can fix these issues by: 1) introduce INCONSISTENT state. modify fails change to INCONSISTENT. del fail follows the original code path 2) normal revalidator part process INCONSISTENT, changing it into EVICTING and initial a new delete operation. the sweep part will not process INCONSISTENT, leaving all the processing into the normal revalidator case. > > Cheers, > > Eelco > > -- hepeng _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
