On Wed, Aug 10, 2022 at 11:08:06AM +0200, Claudio Jeker wrote:
> On Fri, Jul 01, 2022 at 04:03:21PM +0000, Visa Hankala wrote:
> > On Fri, Jul 01, 2022 at 09:59:11AM +0200, Claudio Jeker wrote:
> > > On Thu, Jun 30, 2022 at 03:46:35PM +0000, Visa Hankala wrote:
> > > > On Thu, Jun 30, 2022 at 11:51:52AM +0200, Claudio Jeker wrote:
> > > > > After discussing this with mpi@ and jmatthew@ we came to the 
> > > > > conclusion
> > > > > that we need to smr_barrier() before refcnt_finalize() to ensure that 
> > > > > no
> > > > > other CPU is between the SMR_TAILQ_FOREACH, refcnt_take() and
> > > > > smr_read_leave().
> > > > 
> > > > [...]
> > > > 
> > > > > @@ -509,7 +487,8 @@ route_input(struct mbuf *m0, struct sock
> > > > >               return;
> > > > >       }
> > > > >  
> > > > > -     SRPL_FOREACH(rop, &sr, &rtptable.rtp_list, rop_list) {
> > > > > +     smr_read_enter();
> > > > > +     SMR_TAILQ_FOREACH(rop, &rtptable.rtp_list, rop_list) {
> > > > >               /*
> > > > >                * If route socket is bound to an address family only 
> > > > > send
> > > > >                * messages that match the address family. Address 
> > > > > family
> > > > > @@ -519,7 +498,8 @@ route_input(struct mbuf *m0, struct sock
> > > > >                   rop->rop_proto != sa_family)
> > > > >                       continue;
> > > > >  
> > > > > -
> > > > > +             refcnt_take(&rop->rop_refcnt);
> > > > > +             smr_read_leave();
> > > > >               so = rop->rop_socket;
> > > > >               solock(so);
> > > > >  
> > > > > @@ -579,8 +559,10 @@ route_input(struct mbuf *m0, struct sock
> > > > >               rtm_sendup(so, m);
> > > > >  next:
> > > > >               sounlock(so);
> > > > > +             smr_read_enter();
> > > > > +             refcnt_rele_wake(&rop->rop_refcnt);
> > > > 
> > > > This does not look correct.
> > > > 
> > > > smr_barrier() can proceed after smr_read_leave(), so refcnt_rele_wake()
> > > > might drop the final reference and this thread can no longer access
> > > > rop safely (SMR_TAILQ_NEXT() inside SMR_TAILQ_FOREACH()).
> > > > 
> > > > Also, SMR_TAILQ_NEXT() of rop becomes potentially dangling after
> > > > smr_read_leave(). After this thread leaves the read-side critical
> > > > section, another thread might free rop's successor.
> > > 
> > > So we need to either smr_barrier() before and after the refcnt_finalize()
> > > to make sure that the rop pointer remains stable in both cases or we alter
> > > the SMR_TAILQ_FOREACH() loop so that SMR_TAILQ_NEXT can be grabbed before
> > > refcnt_rele_wake().
> > > 
> > > While the double smr_barrier() is trivial it is not ideal and I think it
> > > is better to adjust the loop since SMR loops with sleep points is a
> > > somewhat common issue and so we should have a good clear way on how to
> > > solve it.
> > 
> > Adjusting SMR_TAILQ_FOREACH() will not help.
> > 
> > In general, a reader cannot resume a lockless iteration after it has
> > left the read-side critical section and crossed a sleep point. The
> > guarantee of consistent(-looking) forward linkage is gone. The reader
> > no longer knows if the value of SMR_TAILQ_NEXT() is valid. If the
> > reader wants to continue with the list, it has to re-enter the read-side
> > critical section and restart the iteration.
> 
> This is not a real SMR_TAILQ_FOREACH() use case so trying to use
> SMR_TAILQ_FOREACH() here is not right. The code wants to walk the list of
> route pcbs linked via rop_list. The code just needs to walk all active
> connections and does not care about races with sockets that are concurrently
> closed or opened. In the first case SS_CANTRCVMORE will be set and the
> socket is skipped and in the second case the socket is simply ignored
> because new sockets are inserted at the head of the list.
>  
> It is not a lockless iteration over the full list. It is not required to
> be either. The only thing that matters is that the forward linkage is
> consitent (not pointing to invalid objects).
> 
> There is no need to restart the iteration because element on the list can
> not be reinserted. They can only be removed and a removed element does not
> get any message anyway (either by not visiting the object or by skipping
> it in the loop).
> 
> The refcnt ensures that the currently used pcb is not freed before the
> next element is picked. As long as the refcnt is hold the object can't be
> removed.

Lets assume that another thread begins to detach rop while
route_input() sleeps. The reference prevents early freeing of rop.

Lets assume further that yet another thread detaches and frees the
successor of rop while the first thread is still sleeping. What will
SMR_LIST_NEXT(rop, rop_list) point to?

Reply via email to