On Thu, Jun 30, 2022 at 11:51:52AM +0200, Claudio Jeker wrote:
> After discussing this with mpi@ and jmatthew@ we came to the conclusion
> that we need to smr_barrier() before refcnt_finalize() to ensure that no
> other CPU is between the SMR_TAILQ_FOREACH, refcnt_take() and
> smr_read_leave().
[...]
> @@ -509,7 +487,8 @@ route_input(struct mbuf *m0, struct sock
> return;
> }
>
> - SRPL_FOREACH(rop, &sr, &rtptable.rtp_list, rop_list) {
> + smr_read_enter();
> + SMR_TAILQ_FOREACH(rop, &rtptable.rtp_list, rop_list) {
> /*
> * If route socket is bound to an address family only send
> * messages that match the address family. Address family
> @@ -519,7 +498,8 @@ route_input(struct mbuf *m0, struct sock
> rop->rop_proto != sa_family)
> continue;
>
> -
> + refcnt_take(&rop->rop_refcnt);
> + smr_read_leave();
> so = rop->rop_socket;
> solock(so);
>
> @@ -579,8 +559,10 @@ route_input(struct mbuf *m0, struct sock
> rtm_sendup(so, m);
> next:
> sounlock(so);
> + smr_read_enter();
> + refcnt_rele_wake(&rop->rop_refcnt);
This does not look correct.
smr_barrier() can proceed after smr_read_leave(), so refcnt_rele_wake()
might drop the final reference and this thread can no longer access
rop safely (SMR_TAILQ_NEXT() inside SMR_TAILQ_FOREACH()).
Also, SMR_TAILQ_NEXT() of rop becomes potentially dangling after
smr_read_leave(). After this thread leaves the read-side critical
section, another thread might free rop's successor.