Jianbo Liu <[email protected]> 于2021年6月1日周二 10:02写道:
> The 05/31/2021 19:58, Ilya Maximets wrote: > > On 5/28/21 5:16 AM, Jianbo Liu wrote: > > > The 05/27/2021 12:43, Ilya Maximets wrote: > > >> On 5/27/21 12:29 PM, Jianbo Liu wrote: > > >>> There is a race condidtion between purger and handler in > dpif-netlink. > > >>> Handler may create new ukey and install it while executing > 'ovs-appctl > > >>> revalidator/purge' command. However, before handler calls > > >>> transition_ukey() in handle_upcalls(), purger may get this ukey from > > >>> umap, then evict and delete it. This will trigger ovs_abort in > > >>> transition_ukey() for handler because it is trying to set state to > > >>> EVICTED or OPERATIONAL, but ukey is already in DELETED state. > > >>> To fix this issue, purger must not delete ukey in VISIBLE state. > > >> > > >> Hi. This is not a good thing to trigger abort(), but "purge" means > > >> "purge". And, AFAIU, most of ukeys are visible. This is a purely > > >> debug interface that was introduced to test functionality of OVS and > > >> should not be used in production environment. The fact that "purge" > > >> doesn't remove visible ukeys, in my opinion, just makes the appctl > > >> "revalidator/purge" useless. Can we replace abort() with rate-limited > > > > > > But currently ukey is also not removed if we can't hold its lock. > > > Besides, new ukey will be installed while purging. We can't make sure > > > that no visilbe/operational ukeys exist at the monent this command is > > > finished. So in order to resolve the race, visible ukey should be kept, > > > it is just like a new incoming ukey. > > > > The purpose of this call is to be used in the testsuite where we have > > a full control over traffic that flows through OVS ports, so locking > > issue or installation of a new ukey is not possible there. > > > > > > > >> error message for this scenario instead to avoid process termination? > > > > > > No sure if there are issues, because ukey is deleted (maybe freed) > while > > > handler still access it by the pointer. > > > > Well, if there is a race that could lead to a crash, it make sense > > fixing, but it doesn't make sense to fix only for one datapath type. > > It is related to handle_upcalls(), and dpif-netlink calls it, but > dpif-netdev does not, right? > even only dpif-netlink calls it, the userspace datapath has the similar code in the *push_dp_ops*, and currently I see a different processing logic here. So I guess this issue still exists in the userspace datapath. Am I right? I think the revalidator/purge command has its usage in the production environments. it provides, perhaps the only way, to fix issues when ukey and megaflows are inconsistent when there is a bug in revalidate code. Am I right? I think maybe we can limit the push_dp_ops to only do the transcation of the ukey state to EVICTED or OPERATIONAL if its state < DELETED, without changing the process of purge. and by the way, it seems that the kernel datapath has now different operations with the userspace datapath, the ukey turns into OPERATIONAL state while in the userspace, ukey is first set in the state of VISIBLE. It will turn into OPERATIONAL right after its megaflow gets dumped. > > > > > If it's not possible to fix keeping the semantics of the call, I'd > > prefer to not fix that at all as we need this appctl for unit testing, > > other use cases are not supported. i.e. there is no point to change > > this call if the fix beaks its main functionality. > > I don't think it breaks. If it's used only for unit test as you > suggested, there should not be VISIBLE ukey while purge for > dpif-netlink, so the behavior is still the same. > > > > > >> > > >> BTW, how did you catch this? Is it reproducible with system tests? > > > > > > We found this issue when testing CT offload, do purge without stop > > > traffic. > > > > What is the point of calling 'purge' in this scenario if it will > > not purge visible ukeys? > > Maybe stability, I think. We don't know what command user will run on > his system. Whatever he do, we must make sure openvswitch does not > crash, right? > > > > > > > > > Thanks! > > > Jianbo > > > > > >> > > >> Best regards, Ilya Maximets. > > > > > > > -- > _______________________________________________ > dev mailing list > [email protected] > https://mail.openvswitch.org/mailman/listinfo/ovs-dev > -- hepeng _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
