On Thu, Feb 15, 2018 at 5:30 AM, Mateusz Guzik <mjgu...@gmail.com> wrote: > On Tue, Feb 13, 2018 at 05:14:38PM +0900, Ryota Ozaki wrote: >> panic: kernel diagnostic assertion "(psref->psref_cpu == curcpu())" >> failed: file "/(hidden)/sys/kern/subr_psref.c", line 317 passive >> reference transferred from CPU 0 to CPU 3 >> >> I first thought that something went wrong in an ioctl handler >> for example curlwp_bindx was called doubly and LP_BOUND was unset >> then the LWP was migrated to another CPU. However, this kind of >> assumptions was denied by KASSERTs in psref_release. So I doubted >> of LP_BOUND and found there is a race condition on LWP migrations. >> >> curlwp_bind sets LP_BOUND to l_pflags of curlwp and that prevents >> curlwp from migrating to another CPU until curlwp_bindx is called. > > The bug you found (and I trimmed) looks like the culprit, but there is > an extra problem which probably happens to not manifest itself in terms > of code generation: the bind/unbind inlines lack compiler barriers. See > KPREEMPT_* inlines for comparison. The diff is definitely trivial: > > diff --git a/sys/sys/lwp.h b/sys/sys/lwp.h > index 47d162271f9c..f18b76b984e4 100644 > --- a/sys/sys/lwp.h > +++ b/sys/sys/lwp.h > @@ -536,6 +536,7 @@ curlwp_bind(void) > > bound = curlwp->l_pflag & LP_BOUND; > curlwp->l_pflag |= LP_BOUND; > + __insn_barrier(); > > return bound; > } > @@ -545,6 +546,7 @@ curlwp_bindx(int bound) > { > > KASSERT(curlwp->l_pflag & LP_BOUND); > + __insn_barrier(); > curlwp->l_pflag ^= bound ^ LP_BOUND; > } > > -- > Mateusz Guzik <mjguzik gmail.com>
Right, the barriers are basically required as you say. Note that for the current usage of curlwp_bind, i.e., preventing LWP migrations between psref_acquire and psref_release, the barriers are not must, IIUC. Because psref_acquire and psref_release are functions and no reordering probably happen between psref_* and curlwp_*. Even if a compiler optimization tries to reorder, curlwp_* won't cross psref_* because psref_* uses percpu that uses kpreempt APIs that call __insn_barrier. Anyway I'll add the barriers. Thanks, ozaki-r