On 2/27/21 9:53 AM, 贺鹏 wrote:
> Hi,
>
> Thanks William and Ilya for a detailed revisiting of the origin of the
> problem. I learned a lot.
>
> I now understand that the mix using of RCU and refcounts is not
> intended in the first place.
> But my point is that mix using RCU and refcounts now gives you more
> choices, and actually eases the code changes.
>
> For example, the code for * ofproto_dpif_lookup_by_name* or other
> ofproto lookup function,
> when only using refcounts, you need to change it to:
>
> struct ofproto_dpif *
> ofproto_dpif_lookup_by_uuid(const struct uuid *uuid)
> {
> struct ofproto_dpif *ofproto;
>
> HMAP_FOR_EACH_WITH_HASH (ofproto, all_ofproto_dpifs_by_uuid_node,
> uuid_hash(uuid), &all_ofproto_dpifs_by_uuid) {
> if (uuid_equals(&ofproto->uuid, uuid)) {
>
> ---> if ovs_refcount_try_ref(ofproto)
>
> return ofproto;
> }
> }
> return NULL;
> }
That is an interesting example here...
I can't help but notice that this function is typically called
from different handler or pmd threads and modified by the main thread
while upcalls enabled. And hmap is not a thread-safe structure.
I guess, we have another possible problem here. We need to protect
at least this hmap and the other one with rw lock or something...
Am I right or am I missing something? What else we need to protect?
>
> and after finish its usage, you have to do ofproto_unref the ofproto.
>
> This is why I said, most accessing to ofproto is ephemeral. If you
> change to use the pure refcounts solution, you have
> to add refcount and release it every time you access the ofproto. We
> should be more careful and remember to unref
> the ofproto.
>
> However, if using RCU and refcounts, in the above case, you do not
> need to change the code, since the RCU ensures that
> these ephemeral accesses are safe.
>
> you only need to add refcount, when you find that the pointer to
> ofproto lives longer than one grace period.
>
>
> This is why in my patch, I do not add ref to ofproto after its
> creation. I agree the patch is not complete and has issues,
> and understand it could confuse people if changes into mix RCU and
> refcounts version.
>
>
> William Tu <[email protected]> 于2021年2月26日周五 上午2:32写道:
>>
>> On Thu, Feb 25, 2021 at 4:54 AM Ilya Maximets <[email protected]> wrote:
>>>
>>> On 2/25/21 4:32 AM, Guohongzhi (Russell Lab) wrote:
>>>> Refcount and RCU are not mutually exclusive.
>>>> IMO, the main reason for this problem is that the rule uses both refcount
>>>> and rcu, while the ofproto uses only rcu, and the rule retains the pointer
>>>> of the ofproto. More importantly, ofproto cannot guarantee a longer grace
>>>> period than the rule.
>>>>
>>>
>>> I understand that refcount and RCU are not mutually exclusive.
>>> However, in this particular case RCU for ofproto was introduced for one
>>> and only one reason - to avoid freeing ofproto while rules are still
>>> alive. This was done in commit f416c8d61601 ("ofproto: RCU postpone
>>> rule destruction."). The goal was to allow using rules without
>>> refcounting them within a single grace period. And that forced us
>>> to postpone destruction of the ofproto for a single grace period.
>>> Later commit 39c9459355b6 ("Use classifier versioning.") made it
>>> possible for rules to be alive for more than one grace period, so
>>> the commit made ofproto wait for 2 grace periods by double postponing.
>>> As we can see now, that wasn't enough and we have to wait for more
>>> than 2 grace periods in certain cases.
>>>
>>> Now we're introducing refcounts to wait for all rules to be destroyed
>>> before destroying the ofproto. But what is the point of having
>>> RCU if we already know that if refcount is zero than all rules are
>>> already destroyed and we don't need to wait anything and could just
>>> destroy the ofproto?
>>>
>>> RCU doesn't protect ofproto itself here, it actually protects rules,
>>> i.e. keeps ofproto alive while rules alive, but we can fully replace
>>> this with refcounts and I see no value in having RCU additionally.
>>>
>>> To have a full picture: right now we also have groups protected by
>>> RCU, so we need to refcount ofproto for them too. But that is almost
>>> exactly same situation as we have with rules.
>>>
>> Thanks Guohongzhi, Hepeng, and Ilya, I learned a lot from your discussions.
>>
>> I think refcount and RCU are mutually exclusive. They are two different ways
>> of
>> doing synchronization and somehow we mix them together by using RCU to
>> optimize refcount.
>>
>> Before the commit f416c8d61601 ("ofproto: RCU postpone rule destruction."),
>> we are using refcount to protect rules, and as a result every time OVS
>> references
>> a rule, we have to take refcount. It works ok but this probably has
>> high performance
>> overhead because it's doing atomic operations.
>>
>> So the commit f416c8d61601 optimizes it by doing
>> 1) not taking refcount of rule "within a grace period"
>> 2) introduce RCU for rule
>> The assumption is that a grace period is like refcount == 0, so it's
>> valid to do so.
>> A side effect is that ofproto_destroy__() needs to be postponed.
>> Note that if a rule is alive across grace period, it has to take refcount.
>>
>> However, later commit 39c9459355b6 ("Use classifier versioning.")
>> makes rule alive across grace period. It breaks the first commit's assumption
>> and it has to introduce double postponing for ofproto, which we found
>> out is the problem now.
>>
>> Since my point is RCU and refcount are mutually exclusive (A grace period
>> is like refcount ==0) , I agree with Ilya that we can just use refcount to
>> fix
>> ofproto issue.
>>
>> Regards,
>> William
>
>
>
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev