Re: [ovs-dev] [External] Re:[ovs-dev, ovs-dev, v2, 4/4] dpif-netdev: fix inconsistent processing between ukey and megaflow

2022-09-23 Thread . 贺鹏
Hi, Zhike,

After receiving your email, I was becoming curious about this code and did
more investigation on it.

and I found some problems with the code and now I believe this inconsistent
processing is NOT the root cause for the inconsistent actions between ukey
and datapath.
and I found a new cause for that, but due to this complex race between PMD
and revalidator, I wish this time I am right.

But before that, why are you interested in this patch? Have you found the
same issue in your environment?




On Thu, Sep 22, 2022 at 6:54 PM .贺鹏  wrote:

> Hi, zhike,
>
> It's difficult to give a very clear sequences about how this inconsistency
> happens, but I can give you more details.
>
> This is observed in our production environment. The correct megaflow
> should encap packets with vxlan header and send out, but the action is drop.
> This is usually because the neigh info is not available at the moment when
> the upcall happens.
>
> Normally, the drop action is ephemeral, and reavalidator will later modify
> the megaflow's action into the tnl_push.
>
> But there are a few of cases, only happened 1~2 times in a year, where the
> drop actions will never be replaced by tnl_push.
>
> just like in the commits mentioned,
>
> "The coverage command shows revalidators have dumped several times,
> however the correct actions are not set. This implies that the ukey's
> action does not equal to the meagaflow's, i.e. revalidators think the
> underlying
>
> megaflow's actions are correct however they are not."
>
>
> I do not know how this happened, but I do think this inconsistent processing 
> could be one of the reasons.
>
> Even there is no such bug, I think keeping processing inconsistent is 
> necessary.
>
>
>
>
>
> On Wed, Sep 21, 2022 at 5:57 PM 王志克  wrote:
>
>> Hi Hepeng,
>>
>>
>>
>> Can you please explain the sequence that how this inconsistence could
>> happen? Why you believe the current actions in existing netdev_flow is old?
>>
>>
>>
>> Thanks.
>>
>>
>>
>> Br,
>>
>> wangzhike
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> *
>>
>> [ovs-dev,ovs-dev,v2,4/4] dpif-netdev: fix inconsistent processing between
>> ukey and megaflow
>>
>> Message ID
>>
>> 20220604151857.66550-4-hepeng.0...@bytedance.com
>>
>> State
>>
>> New
>>
>> Headers
>>
>> show
>>
>> Series
>>
>> [ovs-dev,ovs-dev,v2,1/4] ofproto-dpif-upcall: fix push_dp_ops
>> |
>> expand
>> Checks
>>
>> Context
>>
>> Check
>>
>> Description
>>
>> ovsrobot/apply-robot
>>
>> *warning*
>>
>> apply and check: warning
>> 
>>
>> ovsrobot/github-robot-_Build_and_Test
>>
>> *success*
>>
>> github build: passed
>> 
>>
>> ovsrobot/intel-ovs-compilation
>>
>> *success*
>>
>> test: success
>> 
>> Commit Message
>>
>> Peng He
>> June
>> 4, 2022, 3:18 p.m. UTC
>>
>> When PMDs perform upcalls, the newly generated ukey will replace
>>
>> the old, however, the newly generated mageflow will be discard
>>
>> to reuse the old one without checking if the actions of new and
>>
>> old are equal.
>>
>>
>>
>> We observe in the production environment that sometimes a megaflow
>>
>> with wrong actions keep staying in datapath. The coverage command shows
>>
>> revalidators have dumped serveral times, however the correct
>>
>> actions are not set. This implies that the ukey's action does not
>>
>> equal to the meagaflow's, i.e. revalidators think the underlying
>>
>> megaflow's actions are correct however they are not.
>>
>>
>>
>> We also check the megaflow using the ofproto/trace command, and the
>>
>> actions are not matched with the ones in the actual magaflow. By
>>
>> performing a revalidator/purge command, the right actions are set.
>>
>>
>>
>> *Signed-off-by: Peng He > >*
>>
>> ---
>>
>>  lib/dpif-netdev.c | 17 -
>>
>>  1 file changed, 16 insertions(+), 1 deletion(-)
>>
>> Comments
>>
>> 0-day Robot
>> June
>> 4, 2022, 3:44 p.m. UTC | #1
>> 
>>
>> Bleep bloop.  Greetings Peng He, I am a robot and I have tried out your 
>> patch.
>>
>> Thanks for your contribution.
>>
>>
>>
>> I encountered some error that I wasn't expecting.  See the details below.
>>
>>
>>
>>
>>
>> checkpatch:
>>
>> ERROR: Author Peng He  needs to sign off.
>>
>> WARNING: Unexpected sign-offs from developers who are not authors or 
>> co-authors or committers: Peng He 
>>
>> Lines checked: 58, Warnings: 1, Errors: 1
>>
>>
>>
>>
>>
>> Please check this out.  If you feel there has been an error, please email 
>> 

Re: [ovs-dev] [External] Re:[ovs-dev, ovs-dev, v2, 4/4] dpif-netdev: fix inconsistent processing between ukey and megaflow

2022-09-22 Thread . 贺鹏
Hi, zhike,

It's difficult to give a very clear sequences about how this inconsistency
happens, but I can give you more details.

This is observed in our production environment. The correct megaflow should
encap packets with vxlan header and send out, but the action is drop.
This is usually because the neigh info is not available at the moment when
the upcall happens.

Normally, the drop action is ephemeral, and reavalidator will later modify
the megaflow's action into the tnl_push.

But there are a few of cases, only happened 1~2 times in a year, where the
drop actions will never be replaced by tnl_push.

just like in the commits mentioned,

"The coverage command shows revalidators have dumped several times,
however the correct actions are not set. This implies that the ukey's
action does not equal to the meagaflow's, i.e. revalidators think the
underlying

megaflow's actions are correct however they are not."


I do not know how this happened, but I do think this inconsistent
processing could be one of the reasons.

Even there is no such bug, I think keeping processing inconsistent is necessary.





On Wed, Sep 21, 2022 at 5:57 PM 王志克  wrote:

> Hi Hepeng,
>
>
>
> Can you please explain the sequence that how this inconsistence could
> happen? Why you believe the current actions in existing netdev_flow is old?
>
>
>
> Thanks.
>
>
>
> Br,
>
> wangzhike
>
>
>
>
>
>
>
>
>
>
> *
>
> [ovs-dev,ovs-dev,v2,4/4] dpif-netdev: fix inconsistent processing between
> ukey and megaflow
>
> Message ID
>
> 20220604151857.66550-4-hepeng.0...@bytedance.com
>
> State
>
> New
>
> Headers
>
> show
>
> Series
>
> [ovs-dev,ovs-dev,v2,1/4] ofproto-dpif-upcall: fix push_dp_ops
> |
> expand
> Checks
>
> Context
>
> Check
>
> Description
>
> ovsrobot/apply-robot
>
> *warning*
>
> apply and check: warning
> 
>
> ovsrobot/github-robot-_Build_and_Test
>
> *success*
>
> github build: passed
> 
>
> ovsrobot/intel-ovs-compilation
>
> *success*
>
> test: success
> 
> Commit Message
>
> Peng He
> June
> 4, 2022, 3:18 p.m. UTC
>
> When PMDs perform upcalls, the newly generated ukey will replace
>
> the old, however, the newly generated mageflow will be discard
>
> to reuse the old one without checking if the actions of new and
>
> old are equal.
>
>
>
> We observe in the production environment that sometimes a megaflow
>
> with wrong actions keep staying in datapath. The coverage command shows
>
> revalidators have dumped serveral times, however the correct
>
> actions are not set. This implies that the ukey's action does not
>
> equal to the meagaflow's, i.e. revalidators think the underlying
>
> megaflow's actions are correct however they are not.
>
>
>
> We also check the megaflow using the ofproto/trace command, and the
>
> actions are not matched with the ones in the actual magaflow. By
>
> performing a revalidator/purge command, the right actions are set.
>
>
>
> *Signed-off-by: Peng He  >*
>
> ---
>
>  lib/dpif-netdev.c | 17 -
>
>  1 file changed, 16 insertions(+), 1 deletion(-)
>
> Comments
>
> 0-day Robot
> June
> 4, 2022, 3:44 p.m. UTC | #1 
>
> Bleep bloop.  Greetings Peng He, I am a robot and I have tried out your patch.
>
> Thanks for your contribution.
>
>
>
> I encountered some error that I wasn't expecting.  See the details below.
>
>
>
>
>
> checkpatch:
>
> ERROR: Author Peng He  needs to sign off.
>
> WARNING: Unexpected sign-offs from developers who are not authors or 
> co-authors or committers: Peng He 
>
> Lines checked: 58, Warnings: 1, Errors: 1
>
>
>
>
>
> Please check this out.  If you feel there has been an error, please email 
> acon...@redhat.com
>
>
>
> Thanks,
>
> 0-day Robot
>
> 1638948diff
> 
> mbox
> 
> series 
> Patch
>
> *diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c*
>
> *index ff57b3961..985c25c58 100644*
>
> *--- a/lib/dpif-netdev.c*
>
> *+++ b/lib/dpif-netdev.c*
>
> *@@ -8305,7 +8305,22 @@*  handle_packet_upcall(struct dp_netdev_pmd_thread 
> *pmd,
>
>   * to be locking revalidators out of making flow modifications. */
>
>  ovs_mutex_lock(>flow_mutex);
>
>  netdev_flow = dp_netdev_pmd_lookup_flow(pmd, key,