The 05/31/2021 19:58, Ilya Maximets wrote: > On 5/28/21 5:16 AM, Jianbo Liu wrote: > > The 05/27/2021 12:43, Ilya Maximets wrote: > >> On 5/27/21 12:29 PM, Jianbo Liu wrote: > >>> There is a race condidtion between purger and handler in dpif-netlink. > >>> Handler may create new ukey and install it while executing 'ovs-appctl > >>> revalidator/purge' command. However, before handler calls > >>> transition_ukey() in handle_upcalls(), purger may get this ukey from > >>> umap, then evict and delete it. This will trigger ovs_abort in > >>> transition_ukey() for handler because it is trying to set state to > >>> EVICTED or OPERATIONAL, but ukey is already in DELETED state. > >>> To fix this issue, purger must not delete ukey in VISIBLE state. > >> > >> Hi. This is not a good thing to trigger abort(), but "purge" means > >> "purge". And, AFAIU, most of ukeys are visible. This is a purely > >> debug interface that was introduced to test functionality of OVS and > >> should not be used in production environment. The fact that "purge" > >> doesn't remove visible ukeys, in my opinion, just makes the appctl > >> "revalidator/purge" useless. Can we replace abort() with rate-limited > > > > But currently ukey is also not removed if we can't hold its lock. > > Besides, new ukey will be installed while purging. We can't make sure > > that no visilbe/operational ukeys exist at the monent this command is > > finished. So in order to resolve the race, visible ukey should be kept, > > it is just like a new incoming ukey. > > The purpose of this call is to be used in the testsuite where we have > a full control over traffic that flows through OVS ports, so locking > issue or installation of a new ukey is not possible there. > > > > >> error message for this scenario instead to avoid process termination? > > > > No sure if there are issues, because ukey is deleted (maybe freed) while > > handler still access it by the pointer. > > Well, if there is a race that could lead to a crash, it make sense > fixing, but it doesn't make sense to fix only for one datapath type.
It is related to handle_upcalls(), and dpif-netlink calls it, but dpif-netdev does not, right? > > If it's not possible to fix keeping the semantics of the call, I'd > prefer to not fix that at all as we need this appctl for unit testing, > other use cases are not supported. i.e. there is no point to change > this call if the fix beaks its main functionality. I don't think it breaks. If it's used only for unit test as you suggested, there should not be VISIBLE ukey while purge for dpif-netlink, so the behavior is still the same. > > >> > >> BTW, how did you catch this? Is it reproducible with system tests? > > > > We found this issue when testing CT offload, do purge without stop > > traffic. > > What is the point of calling 'purge' in this scenario if it will > not purge visible ukeys? Maybe stability, I think. We don't know what command user will run on his system. Whatever he do, we must make sure openvswitch does not crash, right? > > > > > Thanks! > > Jianbo > > > >> > >> Best regards, Ilya Maximets. > > > -- _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
