Hi, Zhike,
First I am trying to explain why this fix is not fixing it. This lock-and-lookup code is actually for the case when all megaflows are not per-PMD stored according to the ovs commit history. So it's necessary to check if the megaflow is installed by another PMD. Now megaflows are per-PMD stored and so the reason for megaflow insertion can only be upcalls from this PMD, except you are doing dpctl/flow-add to insert a megaflow manually, but I guess this is not the case. So, this code is mostly dead code, meaning that it's mostly impossible for the code to run into this place in the real environment. Revalidators only modify and delete megaflows. it will not add one. Upcalls only happends when the megaflow do not exist in the datapath. We can not expect a megaflow with drop action already resides in the cmap, while upcall still happens, brings the correct actions, however does not install the megaflow but reuse the old one. it's just impossible. Then my new suspect is that perhaps the ukey is now with drop action and megaflow is also with drop action. And now the revalidator tries to modify the actions, but somehow the modifying action fails. However, this failure is ignored. So the inconsistency now exists So my first patch actually fixes it... "[ovs-dev,ovs-dev,v2,1/4] ofproto-dpif-upcall: fix push_dp_ops" On Fri, Sep 23, 2022 at 9:39 PM 王志克 <[email protected]> wrote: > Hi Peng, > > > > Right, I also met this issue, and wondering the sequence for this > inconsistence, and would like to hear your “new cause”. > > > > Anyway I believe your below patch should fix this issue. > > [ovs-dev,ovs-dev,v2,1/4] ofproto-dpif-upcall: fix push_dp_ops > > > > Br, > > Zhike > > > > *From: *".贺鹏" <[email protected]> > *Date: *Friday, September 23, 2022 at 8:59 PM > *To: *王志克 <[email protected]> > *Cc: *"[email protected]" <[email protected]>, " > [email protected]" <[email protected]> > *Subject: *[来自外部的邮件]Re: [External] Re:[ovs-dev,ovs-dev,v2,4/4] > dpif-netdev: fix inconsistent processing between ukey and megaflow > > > > 京东安全提示:此封邮件来自公司外部,除非您能判断发件人和知道邮件内容安全,否则请勿打开链接或者附件。 > * JD Security Tips: Please do not click on links or open attachments > unless you trust the sender and know the content is safe.* > > > > Hi, Zhike, > > > > After receiving your email, I was becoming curious about this code and did > more investigation on it. > > > > and I found some problems with the code and now I believe this > inconsistent processing is NOT the root cause for the inconsistent actions > between ukey and datapath. > > and I found a new cause for that, but due to this complex race between PMD > and revalidator, I wish this time I am right. > > > > But before that, why are you interested in this patch? Have you found the > same issue in your environment? > > > > > > > > > > On Thu, Sep 22, 2022 at 6:54 PM .贺鹏 <[email protected]> wrote: > > Hi, zhike, > > > > It's difficult to give a very clear sequences about how this inconsistency > happens, but I can give you more details. > > > > This is observed in our production environment. The correct megaflow > should encap packets with vxlan header and send out, but the action is drop. > > This is usually because the neigh info is not available at the moment when > the upcall happens. > > > > Normally, the drop action is ephemeral, and reavalidator will later modify > the megaflow's action into the tnl_push. > > > > But there are a few of cases, only happened 1~2 times in a year, where the > drop actions will never be replaced by tnl_push. > > > > just like in the commits mentioned, > > > > "The coverage command shows revalidators have dumped several times, > > however the correct actions are not set. This implies that the ukey's > > action does not equal to the meagaflow's, i.e. revalidators think the > underlying > > megaflow's actions are correct however they are not." > > > > I do not know how this happened, but I do think this inconsistent processing > could be one of the reasons. > > Even there is no such bug, I think keeping processing inconsistent is > necessary. > > > > > > > > > > On Wed, Sep 21, 2022 at 5:57 PM 王志克 <[email protected]> wrote: > > Hi Hepeng, > > > > Can you please explain the sequence that how this inconsistence could > happen? Why you believe the current actions in existing netdev_flow is old? > > > > Thanks. > > > > Br, > > wangzhike > > > > > > > > > > > ***************************************************************************************************************************** > > [ovs-dev,ovs-dev,v2,4/4] dpif-netdev: fix inconsistent processing between > ukey and megaflow > > Message ID > > [email protected] > > State > > New > > Headers > > show > > Series > > [ovs-dev,ovs-dev,v2,1/4] ofproto-dpif-upcall: fix push_dp_ops > <http://patchwork.ozlabs.org/project/openvswitch/list/?series=303324>| > expand > Checks > > Context > > Check > > Description > > ovsrobot/apply-robot > > *warning* > > apply and check: warning > <https://mail.openvswitch.org/pipermail/ovs-build/2022-June/022431.html> > > ovsrobot/github-robot-_Build_and_Test > > *success* > > github build: passed > <https://mail.openvswitch.org/pipermail/ovs-build/2022-June/022436.html> > > ovsrobot/intel-ovs-compilation > > *success* > > test: success > <https://mail.openvswitch.org/pipermail/ovs-build/2022-June/022439.html> > Commit Message > > Peng He > <http://patchwork.ozlabs.org/project/openvswitch/list/?submitter=78087>June > 4, 2022, 3:18 p.m. UTC > > When PMDs perform upcalls, the newly generated ukey will replace > > the old, however, the newly generated mageflow will be discard > > to reuse the old one without checking if the actions of new and > > old are equal. > > > > We observe in the production environment that sometimes a megaflow > > with wrong actions keep staying in datapath. The coverage command shows > > revalidators have dumped serveral times, however the correct > > actions are not set. This implies that the ukey's action does not > > equal to the meagaflow's, i.e. revalidators think the underlying > > megaflow's actions are correct however they are not. > > > > We also check the megaflow using the ofproto/trace command, and the > > actions are not matched with the ones in the actual magaflow. By > > performing a revalidator/purge command, the right actions are set. > > > > *Signed-off-by: Peng He <[email protected] > <[email protected]>>* > > --- > > lib/dpif-netdev.c | 17 ++++++++++++++++- > > 1 file changed, 16 insertions(+), 1 deletion(-) > > Comments > > 0-day Robot > <http://patchwork.ozlabs.org/project/openvswitch/list/?submitter=74326>June > 4, 2022, 3:44 p.m. UTC | #1 <http://patchwork.ozlabs.org/comment/2907057/> > > Bleep bloop. Greetings Peng He, I am a robot and I have tried out your patch. > > Thanks for your contribution. > > > > I encountered some error that I wasn't expecting. See the details below. > > > > > > checkpatch: > > ERROR: Author Peng He <[email protected]> needs to sign off. > > WARNING: Unexpected sign-offs from developers who are not authors or > co-authors or committers: Peng He <[email protected]> > > Lines checked: 58, Warnings: 1, Errors: 1 > > > > > > Please check this out. If you feel there has been an error, please email > [email protected] > > > > Thanks, > > 0-day Robot > > 1638948diff > <http://patchwork.ozlabs.org/project/openvswitch/patch/[email protected]/raw/> > mbox > <http://patchwork.ozlabs.org/project/openvswitch/patch/[email protected]/mbox/> > series <http://patchwork.ozlabs.org/series/303324/mbox/> > Patch > > *diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c* > > *index ff57b3961..985c25c58 100644* > > *--- a/lib/dpif-netdev.c* > > *+++ b/lib/dpif-netdev.c* > > *@@ -8305,7 +8305,22 @@* handle_packet_upcall(struct dp_netdev_pmd_thread > *pmd, > > * to be locking revalidators out of making flow modifications. */ > > ovs_mutex_lock(&pmd->flow_mutex); > > netdev_flow = dp_netdev_pmd_lookup_flow(pmd, key, NULL); > > - if (OVS_LIKELY(!netdev_flow)) { > > + if (OVS_UNLIKELY(netdev_flow)) { > > + struct dp_netdev_actions *old_act = > > + dp_netdev_flow_get_actions(netdev_flow); > > + > > + if ((add_actions->size != old_act->size) || > > + memcmp(old_act->actions, add_actions->data, > > + add_actions->size)) { > > + > > + struct dp_netdev_actions *new_act = > > + dp_netdev_actions_create(add_actions->data, > > + add_actions->size); > > + > > + ovsrcu_set(&netdev_flow->actions, new_act); > > + ovsrcu_postpone(dp_netdev_actions_free, old_act); > > + } > > + } else { > > netdev_flow = dp_netdev_flow_add(pmd, &match, &ufid, > > add_actions->data, > > add_actions->size, > orig_in_port); > > > > > > _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
