Hi,

Thanks William and Ilya for a detailed revisiting of the origin of the
problem. I learned a lot.

I now understand that the mix using of RCU and refcounts is not
intended in the first place.
But my point is that mix using RCU and refcounts now gives you more
choices, and actually eases the code changes.

For example, the code for * ofproto_dpif_lookup_by_name* or other
ofproto lookup function,
when only using refcounts, you need to change it to:

 struct ofproto_dpif *
 ofproto_dpif_lookup_by_uuid(const struct uuid *uuid)
 {
     struct ofproto_dpif *ofproto;

     HMAP_FOR_EACH_WITH_HASH (ofproto, all_ofproto_dpifs_by_uuid_node,
                              uuid_hash(uuid), &all_ofproto_dpifs_by_uuid) {
         if (uuid_equals(&ofproto->uuid, uuid)) {

             ---> if  ovs_refcount_try_ref(ofproto)

             return ofproto;
         }
     }
     return NULL;
 }

and after finish its usage, you have to do ofproto_unref the ofproto.

This is why I said, most accessing to ofproto is ephemeral. If you
change to use the pure refcounts solution, you have
to add refcount and release it every time you access the ofproto. We
should be more careful and remember to unref
the ofproto.

However, if using RCU and refcounts, in the above case, you do not
need to change the code, since the RCU ensures that
these ephemeral accesses are safe.

you only need to add refcount, when you find that the pointer to
ofproto lives longer than one grace period.


This is why in my patch, I do not add ref to ofproto after its
creation. I agree the patch is not complete and has issues,
and understand it could confuse people if changes into mix RCU and
refcounts version.


William Tu <u9012...@gmail.com> 于2021年2月26日周五 上午2:32写道:
>
> On Thu, Feb 25, 2021 at 4:54 AM Ilya Maximets <i.maxim...@ovn.org> wrote:
> >
> > On 2/25/21 4:32 AM, Guohongzhi (Russell Lab) wrote:
> > > Refcount and RCU are not mutually exclusive.
> > > IMO, the main reason for this problem is that the rule uses both refcount 
> > > and rcu, while the ofproto uses only rcu, and the rule retains the 
> > > pointer of the ofproto. More importantly, ofproto cannot guarantee a 
> > > longer grace period than the rule.
> > >
> >
> > I understand that refcount and RCU are not mutually exclusive.
> > However, in this particular case RCU for ofproto was introduced for one
> > and only one reason - to avoid freeing ofproto while rules are still
> > alive.  This was done in commit f416c8d61601 ("ofproto: RCU postpone
> > rule destruction.").  The goal was to allow using rules without
> > refcounting them within a single grace period.  And that forced us
> > to postpone destruction of the ofproto for a single grace period.
> > Later commit 39c9459355b6 ("Use classifier versioning.") made it
> > possible for rules to be alive for more than one grace period, so
> > the commit made ofproto wait for 2 grace periods by double postponing.
> > As we can see now, that wasn't enough and we have to wait for more
> > than 2 grace periods in certain cases.
> >
> > Now we're introducing refcounts to wait for all rules to be destroyed
> > before destroying the ofproto.  But what is the point of having
> > RCU if we already know that if refcount is zero than all rules are
> > already destroyed and we don't need to wait anything and could just
> > destroy the ofproto?
> >
> > RCU doesn't protect ofproto itself here, it actually protects rules,
> > i.e. keeps ofproto alive while rules alive, but we can fully replace
> > this with refcounts and I see no value in having RCU additionally.
> >
> > To have a full picture: right now we also have groups protected by
> > RCU, so we need to refcount ofproto for them too.  But that is almost
> > exactly same situation as we have with rules.
> >
> Thanks Guohongzhi, Hepeng, and Ilya, I learned a lot from your discussions.
>
> I think refcount and RCU are mutually exclusive. They are two different ways 
> of
> doing synchronization and somehow we mix them together by using RCU to
> optimize refcount.
>
> Before the commit f416c8d61601 ("ofproto: RCU postpone rule destruction."),
> we are using refcount to protect rules, and as a result every time OVS
> references
> a rule, we have to take refcount. It works ok but this probably has
> high performance
> overhead because it's doing atomic operations.
>
> So the commit f416c8d61601 optimizes it by doing
> 1) not taking refcount of rule "within a grace period"
> 2) introduce RCU for rule
> The assumption is that a grace period is like refcount == 0, so it's
> valid to do so.
> A side effect is that ofproto_destroy__()  needs to be postponed.
> Note that if a rule is alive across grace period, it has to take refcount.
>
> However, later commit 39c9459355b6 ("Use classifier versioning.")
> makes rule alive across grace period. It breaks the first commit's assumption
> and it has to introduce double postponing for ofproto, which we found
> out is the problem now.
>
> Since my point is RCU and refcount are mutually exclusive (A grace period
> is like refcount ==0) , I agree with Ilya that we can just use refcount to fix
> ofproto issue.
>
> Regards,
> William



-- 
hepeng
_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to