On Thu, Jun 30, 2022 at 03:46:35PM +0000, Visa Hankala wrote:
> On Thu, Jun 30, 2022 at 11:51:52AM +0200, Claudio Jeker wrote:
> > After discussing this with mpi@ and jmatthew@ we came to the conclusion
> > that we need to smr_barrier() before refcnt_finalize() to ensure that no
> > other CPU is between the SMR_TAILQ_FOREACH, refcnt_take() and
> > smr_read_leave().
> 
> [...]
> 
> > @@ -509,7 +487,8 @@ route_input(struct mbuf *m0, struct sock
> >             return;
> >     }
> >  
> > -   SRPL_FOREACH(rop, &sr, &rtptable.rtp_list, rop_list) {
> > +   smr_read_enter();
> > +   SMR_TAILQ_FOREACH(rop, &rtptable.rtp_list, rop_list) {
> >             /*
> >              * If route socket is bound to an address family only send
> >              * messages that match the address family. Address family
> > @@ -519,7 +498,8 @@ route_input(struct mbuf *m0, struct sock
> >                 rop->rop_proto != sa_family)
> >                     continue;
> >  
> > -
> > +           refcnt_take(&rop->rop_refcnt);
> > +           smr_read_leave();
> >             so = rop->rop_socket;
> >             solock(so);
> >  
> > @@ -579,8 +559,10 @@ route_input(struct mbuf *m0, struct sock
> >             rtm_sendup(so, m);
> >  next:
> >             sounlock(so);
> > +           smr_read_enter();
> > +           refcnt_rele_wake(&rop->rop_refcnt);
> 
> This does not look correct.
> 
> smr_barrier() can proceed after smr_read_leave(), so refcnt_rele_wake()
> might drop the final reference and this thread can no longer access
> rop safely (SMR_TAILQ_NEXT() inside SMR_TAILQ_FOREACH()).
> 
> Also, SMR_TAILQ_NEXT() of rop becomes potentially dangling after
> smr_read_leave(). After this thread leaves the read-side critical
> section, another thread might free rop's successor.

So we need to either smr_barrier() before and after the refcnt_finalize()
to make sure that the rop pointer remains stable in both cases or we alter
the SMR_TAILQ_FOREACH() loop so that SMR_TAILQ_NEXT can be grabbed before
refcnt_rele_wake().

While the double smr_barrier() is trivial it is not ideal and I think it
is better to adjust the loop since SMR loops with sleep points is a
somewhat common issue and so we should have a good clear way on how to
solve it.

I'll send out a new diff next week
-- 
:wq Claudio

Reply via email to