On 19 Dec 2022, at 11:52, Peng He wrote:
> Eelco Chaudron <[email protected]> 于2022年12月16日周五 23:00写道: > >> >> >> On 16 Dec 2022, at 8:56, Peng He wrote: >> >>> From: Peng He <[email protected]> >>> To: Eelco Chaudron <[email protected]> >>> Cc: Ilya Maximets <[email protected]>, [email protected] >>> Subject: Re: [ovs-dev v7 1/3] ofproto-dpif-upcall: fix push_dp_ops >>> Date: Fri, 16 Dec 2022 15:56:32 +0800 >>> >>> Eelco Chaudron <[email protected]> 于2022年12月13日周二 20:36写道: >>> >>>> >>>> >>>> On 10 Dec 2022, at 1:37, Peng He wrote: >>>> >>>>> Patch v5 has statistics issues. >>>>> >>>>> In order to solve this issue, we had a discussion. >>>>> >>>>> below is the quote of the email. >>>>> >>>>> ” >>>>> After a second thought, I think maybe keeping INCONSISTENT just for the >>>>> modify error is a better option. >>>>> >>>>> With current patch: >>>>> 1. >>>>> the modify error case: >>>>> OPERATIONAL -> INCONSISTENT -> EVICTING -> EVICTED >>>>> 2. >>>>> the delete error case: >>>>> EVICTING -> EVICTED >>>>> >>>>> Change both to INCONSISTENT: >>>>> >>>>> the modify error case: >>>>> did not change. >>>>> >>>>> the delete error case: >>>>> EVICTING -> INCONSISTENT -> EVICTED? >>>>> >>>>> “ >>>>> >>>>> And we agree to take the second solution. >>>> >>>> I know, but going over the state meanings again, UKEY_EVICTING means the >>>> following: >>>> >>>> /* Ukey is in umap, datapath flow delete is queued. */ >>>> >>>> Which now no longer is the case, so should a new state not make more >> sense? >>>> >>> >>> Why it's no longer valid? >>> >>> In the patch, only modify failed ukey will be set to EVICTING, is it just >>> right fit the meaning of >>> EVICTING? (ukey in the umap, but delete operation is queued?) >> >> But it’s not as the delete operation is not queued, that is done in the >> revalidator_sweep__() part. >> > > Understand now. > > >> >>>> >>>> Any one else has some input on this?? >>>> >>>>> Eelco Chaudron <[email protected]> 于2022年12月8日周四 18:54写道: >>>>> >>>>>> >>>>>> >>>>>> On 27 Nov 2022, at 8:28, Peng He wrote: >>>>>> >>>>>>> push_dp_ops only handles delete ops errors but ignores the modify >>>>>>> ops results. It's better to handle all the dp operation errors in >>>>>>> a consistent way. >>>>>>> >>>>>>> We observe in the production environment that sometimes a megaflow >>>>>>> with wrong actions keep staying in datapath. The coverage command >> shows >>>>>>> revalidators have dumped several times, however the correct >>>>>>> actions are not set. This implies that the ukey's action does not >>>>>>> equal to the meagaflow's, i.e. revalidators think the underlying >>>>>>> megaflow's actions are correct however they are not. >>>>>>> >>>>>>> We also check the megaflow using the ofproto/trace command, and the >>>>>>> actions are not matched with the ones in the actual magaflow. By >>>>>>> performing a revalidator/purge command, the right actions are set. >>>>>>> >>>>>>> This patch prevents the inconsistency by considering modify failure >>>>>>> in revalidators. >>>>>>> >>>>>>> To note, we cannot perform two state transitions and change >> ukey_state >>>>>>> into UKEY_EVICTED directly here, because, if we do so, the >>>>>>> sweep will remove the ukey alone and leave dp flow alive. Later, the >>>>>>> dump will retrieve the dp flow and might even recover it. This will >>>>>>> contribute the stats of this dp flow twice. >>>>>>> >>>>>>> Signed-off-by: Peng He <[email protected]> >>>>>>> --- >>>>>>> ofproto/ofproto-dpif-upcall.c | 34 >> +++++++++++++++++++++++----------- >>>>>>> 1 file changed, 23 insertions(+), 11 deletions(-) >>>>>>> >>>>>>> diff --git a/ofproto/ofproto-dpif-upcall.c >>>>>> b/ofproto/ofproto-dpif-upcall.c >>>>>>> index 7ad728adf..c2cefbeb8 100644 >>>>>>> --- a/ofproto/ofproto-dpif-upcall.c >>>>>>> +++ b/ofproto/ofproto-dpif-upcall.c >>>>>>> @@ -2416,26 +2416,30 @@ push_dp_ops(struct udpif *udpif, struct >> ukey_op >>>>>> *ops, size_t n_ops) >>>>>>> >>>>>>> for (i = 0; i < n_ops; i++) { >>>>>>> struct ukey_op *op = &ops[i]; >>>>>>> - struct dpif_flow_stats *push, *stats, push_buf; >>>>>>> - >>>>>>> - stats = op->dop.flow_del.stats; >>>>>>> - push = &push_buf; >>>>>>> - >>>>>>> - if (op->dop.type != DPIF_OP_FLOW_DEL) { >>>>>>> - /* Only deleted flows need their stats pushed. */ >>>>>>> - continue; >>>>>>> - } >>>>>>> >>>>>>> if (op->dop.error) { >>>>>>> - /* flow_del error, 'stats' is unusable. */ >>>>>>> if (op->ukey) { >>>>>>> ovs_mutex_lock(&op->ukey->mutex); >>>>>>> - transition_ukey(op->ukey, UKEY_EVICTED); >>>>>>> + if (op->dop.type == DPIF_OP_FLOW_DEL) { >>>>>>> + transition_ukey(op->ukey, UKEY_EVICTED); >>>>>>> + } else { >>>> >>>> I think we could use a comment here to make sure why we set it to >>>> evicting. Maybe just a reference to the comment in revalidator_sweep__() >>>> might be enough. >>>> >>>>>>> + transition_ukey(op->ukey, UKEY_EVICTING); >>>>>>> + } >>>>>>> ovs_mutex_unlock(&op->ukey->mutex); >>>>>>> } >>>>>>> continue; >>>>>>> } >>>>>>> >>>>>>> + if (op->dop.type != DPIF_OP_FLOW_DEL) { >>>>>>> + /* Only deleted flows need their stats pushed. */ >>>>>>> + continue; >>>>>>> + } >>>>>>> + >>>>>>> + struct dpif_flow_stats *push, *stats, push_buf; >>>>>>> + >>>>>>> + stats = op->dop.flow_del.stats; >>>>>>> + push = &push_buf; >>>>>>> + >>>>>>> if (op->ukey) { >>>>>>> ovs_mutex_lock(&op->ukey->mutex); >>>>>>> transition_ukey(op->ukey, UKEY_EVICTED); >>>>>>> @@ -2848,6 +2852,14 @@ revalidator_sweep__(struct revalidator >>>>>> *revalidator, bool purge) >>>>>>> continue; >>>>>>> } >>>>>>> ukey_state = ukey->state; >>>>>>> + >>>>>>> + if (ukey_state == UKEY_EVICTING) { >>>>>>> + /* previous modify operation fails on this ukey and >>>>>> ukey_state >>>>>>> + * is set to UKEY_EVICTING, issue a delete operation >>>> on >>>>>> this >>>>>>> + * ukey. >>>>>>> + */ >>>>>>> + delete_op_init(udpif, &ops[n_ops++], ukey); >>>> >>>> This would cause the later push_ukey_ops() to call ukey_delete() which >>>> will call transition_ukey(ukey, UKEY_DELETED). >>>> Which I think will result in an invalid state transaction from >>>> UKEY_EVICTING to UKEY_DELETED. >>>> >>> >>> No, push_ukey_ops calls push_dp_ops, and push_dp_ops will delete the >>> datapath key first and then set the state to UKEY_EVICTED. >> >> ACK, you are right. Missed the transition in the successful case as we >> have a DEL operation. >> >>>>>> >>>>>> How can we be sure this state here is only for failing to modify >>>>>> operations? >>>>>> This state is set in other places in the code also. >>>>>> >>>>> >>>>> because revalidate_sweep is called after revaldiate. all the datapath >> ops >>>>> have already been >>>>> pushed, and if there are ukeys at UKEY_EVICTING, it's because >> previously >>>>> datapath ops >>>>> have errors. and if del ops fail, ukey is deleted. so only ukeys at >>>>> UKEY_EVICTING must >>>>> come from modify fail. >>>> >>>> First of all, I missed that p_purge_cb() was pausing the revalidators, >> so >>>> you are right that there is no other way to set UKEY_EVICTING other than >>>> the sweet phase. >>>> >>>> I also missed the part that the ukey_delete() will do the actual >> removal, >>>> and not the if (ukey_state == UKEY_EVICTED) statement lower. >>>> >>>> Maybe we should reconsider having a special state? Anyhow, I do think >> the >>>> comment in the code needs some changing to make this clear: >>>> >>> >>> Yes the comments are clearer, and I can add them in the next version. >>> >>> >>> >>>> >>>> >>>> if (ukey_state == UKEY_EVICTING) { >>>> /* A previous modify operation in push_dp_ops() failed >> and >>>> * caused the state to be UKEY_EVICTING. We need to >>>> manually >>>> * push a delete operation on this ukey to get both the >>>> * datapath flow and ukey removed. The later >>>> push_ukey_ops() >>>> * will call push_dp_ops() and ukey_delete() to >> accomplish >>>> * this. */ >>>> delete_op_init(udpif, &ops[n_ops++], ukey); >>>> } >>>> >>>>>> Maybe we should keep the INCONSISTENT state to avoid this (see v5)? >>>>>> >>>> >>>> What about the case the push_dp_ops()/push_ukey_ops() fails in >>>> revalidator_sweep__() for a none DELETE action, who will take care of >> this? >>>> The alternative fix might be better in this case. >>>> >>> >>> DELETE failure in the original code is taken cared. It will go to EVICTED >>> directly. >>> This leaves a possibility that a key is in the dp but not in the ukey, >>> which is fine, because we have the mecahnism already: >>> >>> check ukey_create_from_dpif_flow and ukey_acquire fail path. >> >> I meant to say if we do a modify in revalidator_sweep__() >> >> If we take the following branch, >> >> if (ukey_state == UKEY_OPERATIONAL >> || (ukey_state == UKEY_VISIBLE && purge)) { >> >> and then revalidate_ukey() gets called and it returns UKEY_MODIFY which >> calls push_dp_ops() and now assume this modify fails… >> >> This will leaf the ukey in UKEY_EVICTING state, as there is no additional >> revalidator_sweep__() call. >> >> This will lead to the below in the next revalidator() call: >> >> VLOG_INFO("Unexpected ukey transition from state %d " >> "(last transitioned from thread %u at %s)", >> >> Or am I missing something? >> > > Yes, it's correct. I miss that part. > > the problem of modify fail is that it needs to initial an extra delete > operation, thus, it relies always a next around process. > > So we can fix these issues by: > > 1) introduce INCONSISTENT state. modify fails change to INCONSISTENT. del > fail follows the original code path > 2) normal revalidator part process INCONSISTENT, changing it into EVICTING > and initial a new delete operation. > > the sweep part will not process INCONSISTENT, leaving all the processing > into the normal revalidator case. So isn’t this moving back to option 1 in patch v5? Which sounds good to me. _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
