Hi, Zhike,

After receiving your email, I was becoming curious about this code and did
more investigation on it.

and I found some problems with the code and now I believe this inconsistent
processing is NOT the root cause for the inconsistent actions between ukey
and datapath.
and I found a new cause for that, but due to this complex race between PMD
and revalidator, I wish this time I am right.

But before that, why are you interested in this patch? Have you found the
same issue in your environment?




On Thu, Sep 22, 2022 at 6:54 PM .贺鹏 <hepeng.0...@bytedance.com> wrote:

> Hi, zhike,
>
> It's difficult to give a very clear sequences about how this inconsistency
> happens, but I can give you more details.
>
> This is observed in our production environment. The correct megaflow
> should encap packets with vxlan header and send out, but the action is drop.
> This is usually because the neigh info is not available at the moment when
> the upcall happens.
>
> Normally, the drop action is ephemeral, and reavalidator will later modify
> the megaflow's action into the tnl_push.
>
> But there are a few of cases, only happened 1~2 times in a year, where the
> drop actions will never be replaced by tnl_push.
>
> just like in the commits mentioned,
>
> "The coverage command shows revalidators have dumped several times,
> however the correct actions are not set. This implies that the ukey's
> action does not equal to the meagaflow's, i.e. revalidators think the
> underlying
>
> megaflow's actions are correct however they are not."
>
>
> I do not know how this happened, but I do think this inconsistent processing 
> could be one of the reasons.
>
> Even there is no such bug, I think keeping processing inconsistent is 
> necessary.
>
>
>
>
>
> On Wed, Sep 21, 2022 at 5:57 PM 王志克 <wangzh...@jd.com> wrote:
>
>> Hi Hepeng,
>>
>>
>>
>> Can you please explain the sequence that how this inconsistence could
>> happen? Why you believe the current actions in existing netdev_flow is old?
>>
>>
>>
>> Thanks.
>>
>>
>>
>> Br,
>>
>> wangzhike
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> *****************************************************************************************************************************
>>
>> [ovs-dev,ovs-dev,v2,4/4] dpif-netdev: fix inconsistent processing between
>> ukey and megaflow
>>
>> Message ID
>>
>> 20220604151857.66550-4-hepeng.0...@bytedance.com
>>
>> State
>>
>> New
>>
>> Headers
>>
>> show
>>
>> Series
>>
>> [ovs-dev,ovs-dev,v2,1/4] ofproto-dpif-upcall: fix push_dp_ops
>> <http://patchwork.ozlabs.org/project/openvswitch/list/?series=303324>|
>> expand
>> Checks
>>
>> Context
>>
>> Check
>>
>> Description
>>
>> ovsrobot/apply-robot
>>
>> *warning*
>>
>> apply and check: warning
>> <https://mail.openvswitch.org/pipermail/ovs-build/2022-June/022431.html>
>>
>> ovsrobot/github-robot-_Build_and_Test
>>
>> *success*
>>
>> github build: passed
>> <https://mail.openvswitch.org/pipermail/ovs-build/2022-June/022436.html>
>>
>> ovsrobot/intel-ovs-compilation
>>
>> *success*
>>
>> test: success
>> <https://mail.openvswitch.org/pipermail/ovs-build/2022-June/022439.html>
>> Commit Message
>>
>> Peng He
>> <http://patchwork.ozlabs.org/project/openvswitch/list/?submitter=78087>June
>> 4, 2022, 3:18 p.m. UTC
>>
>> When PMDs perform upcalls, the newly generated ukey will replace
>>
>> the old, however, the newly generated mageflow will be discard
>>
>> to reuse the old one without checking if the actions of new and
>>
>> old are equal.
>>
>>
>>
>> We observe in the production environment that sometimes a megaflow
>>
>> with wrong actions keep staying in datapath. The coverage command shows
>>
>> revalidators have dumped serveral times, however the correct
>>
>> actions are not set. This implies that the ukey's action does not
>>
>> equal to the meagaflow's, i.e. revalidators think the underlying
>>
>> megaflow's actions are correct however they are not.
>>
>>
>>
>> We also check the megaflow using the ofproto/trace command, and the
>>
>> actions are not matched with the ones in the actual magaflow. By
>>
>> performing a revalidator/purge command, the right actions are set.
>>
>>
>>
>> *Signed-off-by: Peng He <hepeng.0...@bytedance.com 
>> <hepeng.0...@bytedance.com>>*
>>
>> ---
>>
>>  lib/dpif-netdev.c | 17 ++++++++++++++++-
>>
>>  1 file changed, 16 insertions(+), 1 deletion(-)
>>
>> Comments
>>
>> 0-day Robot
>> <http://patchwork.ozlabs.org/project/openvswitch/list/?submitter=74326>June
>> 4, 2022, 3:44 p.m. UTC | #1
>> <http://patchwork.ozlabs.org/comment/2907057/>
>>
>> Bleep bloop.  Greetings Peng He, I am a robot and I have tried out your 
>> patch.
>>
>> Thanks for your contribution.
>>
>>
>>
>> I encountered some error that I wasn't expecting.  See the details below.
>>
>>
>>
>>
>>
>> checkpatch:
>>
>> ERROR: Author Peng He <xnhp0...@gmail.com> needs to sign off.
>>
>> WARNING: Unexpected sign-offs from developers who are not authors or 
>> co-authors or committers: Peng He <hepeng.0...@bytedance.com>
>>
>> Lines checked: 58, Warnings: 1, Errors: 1
>>
>>
>>
>>
>>
>> Please check this out.  If you feel there has been an error, please email 
>> acon...@redhat.com
>>
>>
>>
>> Thanks,
>>
>> 0-day Robot
>>
>> 1638948diff
>> <http://patchwork.ozlabs.org/project/openvswitch/patch/20220604151857.66550-4-hepeng.0...@bytedance.com/raw/>
>> mbox
>> <http://patchwork.ozlabs.org/project/openvswitch/patch/20220604151857.66550-4-hepeng.0...@bytedance.com/mbox/>
>> series <http://patchwork.ozlabs.org/series/303324/mbox/>
>> Patch
>>
>> *diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c*
>>
>> *index ff57b3961..985c25c58 100644*
>>
>> *--- a/lib/dpif-netdev.c*
>>
>> *+++ b/lib/dpif-netdev.c*
>>
>> *@@ -8305,7 +8305,22 @@*  handle_packet_upcall(struct dp_netdev_pmd_thread 
>> *pmd,
>>
>>           * to be locking revalidators out of making flow modifications. */
>>
>>          ovs_mutex_lock(&pmd->flow_mutex);
>>
>>          netdev_flow = dp_netdev_pmd_lookup_flow(pmd, key, NULL);
>>
>> -        if (OVS_LIKELY(!netdev_flow)) {
>>
>> +        if (OVS_UNLIKELY(netdev_flow)) {
>>
>> +            struct dp_netdev_actions *old_act =
>>
>> +                dp_netdev_flow_get_actions(netdev_flow);
>>
>> +
>>
>> +            if ((add_actions->size != old_act->size) ||
>>
>> +                    memcmp(old_act->actions, add_actions->data,
>>
>> +                                             add_actions->size)) {
>>
>> +
>>
>> +               struct dp_netdev_actions *new_act =
>>
>> +                   dp_netdev_actions_create(add_actions->data,
>>
>> +                                            add_actions->size);
>>
>> +
>>
>> +               ovsrcu_set(&netdev_flow->actions, new_act);
>>
>> +               ovsrcu_postpone(dp_netdev_actions_free, old_act);
>>
>> +            }
>>
>> +        } else {
>>
>>              netdev_flow = dp_netdev_flow_add(pmd, &match, &ufid,
>>
>>                                               add_actions->data,
>>
>>                                               add_actions->size, 
>> orig_in_port);
>>
>>
>>
>>
>>
>
_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to