Re: [ovs-dev] [ovs-dev v2] ofproto: add refcount to ofproto to fix crash at ofproto_destroy

Ilya Maximets Mon, 01 Mar 2021 12:15:18 -0800

On 2/27/21 9:53 AM, 贺鹏 wrote:
> Hi,
> 
> Thanks William and Ilya for a detailed revisiting of the origin of the
> problem. I learned a lot.
> 
> I now understand that the mix using of RCU and refcounts is not
> intended in the first place.
> But my point is that mix using RCU and refcounts now gives you more
> choices, and actually eases the code changes.
> 
> For example, the code for * ofproto_dpif_lookup_by_name* or other
> ofproto lookup function,
> when only using refcounts, you need to change it to:
> 
>  struct ofproto_dpif *
>  ofproto_dpif_lookup_by_uuid(const struct uuid *uuid)
>  {
>      struct ofproto_dpif *ofproto;
> 
>      HMAP_FOR_EACH_WITH_HASH (ofproto, all_ofproto_dpifs_by_uuid_node,
>                               uuid_hash(uuid), &all_ofproto_dpifs_by_uuid) {
>          if (uuid_equals(&ofproto->uuid, uuid)) {
> 
>              ---> if  ovs_refcount_try_ref(ofproto)
> 
>              return ofproto;
>          }
>      }
>      return NULL;
>  }


That is an interesting example here...
I can't help but notice that this function is typically called
from different handler or pmd threads and modified by the main thread
while upcalls enabled.  And hmap is not a thread-safe structure.
I guess, we have another possible problem here.  We need to protect
at least this hmap and the other one with rw lock or something...
Am I right or am I missing something?  What else we need to protect?

> 
> and after finish its usage, you have to do ofproto_unref the ofproto.
> 
> This is why I said, most accessing to ofproto is ephemeral. If you
> change to use the pure refcounts solution, you have
> to add refcount and release it every time you access the ofproto. We
> should be more careful and remember to unref
> the ofproto.
> 
> However, if using RCU and refcounts, in the above case, you do not
> need to change the code, since the RCU ensures that
> these ephemeral accesses are safe.
> 
> you only need to add refcount, when you find that the pointer to
> ofproto lives longer than one grace period.
> 
> 
> This is why in my patch, I do not add ref to ofproto after its
> creation. I agree the patch is not complete and has issues,
> and understand it could confuse people if changes into mix RCU and
> refcounts version.
> 
> 
> William Tu <[email protected]> 于2021年2月26日周五 上午2:32写道：
>>
>> On Thu, Feb 25, 2021 at 4:54 AM Ilya Maximets <[email protected]> wrote:
>>>
>>> On 2/25/21 4:32 AM, Guohongzhi (Russell Lab) wrote:
>>>> Refcount and RCU are not mutually exclusive.
>>>> IMO, the main reason for this problem is that the rule uses both refcount 
>>>> and rcu, while the ofproto uses only rcu, and the rule retains the pointer 
>>>> of the ofproto. More importantly, ofproto cannot guarantee a longer grace 
>>>> period than the rule.
>>>>
>>>
>>> I understand that refcount and RCU are not mutually exclusive.
>>> However, in this particular case RCU for ofproto was introduced for one
>>> and only one reason - to avoid freeing ofproto while rules are still
>>> alive.  This was done in commit f416c8d61601 ("ofproto: RCU postpone
>>> rule destruction.").  The goal was to allow using rules without
>>> refcounting them within a single grace period.  And that forced us
>>> to postpone destruction of the ofproto for a single grace period.
>>> Later commit 39c9459355b6 ("Use classifier versioning.") made it
>>> possible for rules to be alive for more than one grace period, so
>>> the commit made ofproto wait for 2 grace periods by double postponing.
>>> As we can see now, that wasn't enough and we have to wait for more
>>> than 2 grace periods in certain cases.
>>>
>>> Now we're introducing refcounts to wait for all rules to be destroyed
>>> before destroying the ofproto.  But what is the point of having
>>> RCU if we already know that if refcount is zero than all rules are
>>> already destroyed and we don't need to wait anything and could just
>>> destroy the ofproto?
>>>
>>> RCU doesn't protect ofproto itself here, it actually protects rules,
>>> i.e. keeps ofproto alive while rules alive, but we can fully replace
>>> this with refcounts and I see no value in having RCU additionally.
>>>
>>> To have a full picture: right now we also have groups protected by
>>> RCU, so we need to refcount ofproto for them too.  But that is almost
>>> exactly same situation as we have with rules.
>>>
>> Thanks Guohongzhi, Hepeng, and Ilya, I learned a lot from your discussions.
>>
>> I think refcount and RCU are mutually exclusive. They are two different ways 
>> of
>> doing synchronization and somehow we mix them together by using RCU to
>> optimize refcount.
>>
>> Before the commit f416c8d61601 ("ofproto: RCU postpone rule destruction."),
>> we are using refcount to protect rules, and as a result every time OVS
>> references
>> a rule, we have to take refcount. It works ok but this probably has
>> high performance
>> overhead because it's doing atomic operations.
>>
>> So the commit f416c8d61601 optimizes it by doing
>> 1) not taking refcount of rule "within a grace period"
>> 2) introduce RCU for rule
>> The assumption is that a grace period is like refcount == 0, so it's
>> valid to do so.
>> A side effect is that ofproto_destroy__()  needs to be postponed.
>> Note that if a rule is alive across grace period, it has to take refcount.
>>
>> However, later commit 39c9459355b6 ("Use classifier versioning.")
>> makes rule alive across grace period. It breaks the first commit's assumption
>> and it has to introduce double postponing for ofproto, which we found
>> out is the problem now.
>>
>> Since my point is RCU and refcount are mutually exclusive (A grace period
>> is like refcount ==0) , I agree with Ilya that we can just use refcount to 
>> fix
>> ofproto issue.
>>
>> Regards,
>> William
> 
> 
> 

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [ovs-dev v2] ofproto: add refcount to ofproto to fix crash at ofproto_destroy

Reply via email to