On Tue, Feb 13, 2018 at 05:14:38PM +0900, Ryota Ozaki wrote:
>   panic: kernel diagnostic assertion "(psref->psref_cpu == curcpu())"
> failed: file "/(hidden)/sys/kern/subr_psref.c", line 317 passive
> reference transferred from CPU 0 to CPU 3
>
> I first thought that something went wrong in an ioctl handler
> for example curlwp_bindx was called doubly and LP_BOUND was unset
> then the LWP was migrated to another CPU. However, this kind of
> assumptions was denied by KASSERTs in psref_release. So I doubted
> of LP_BOUND and found there is a race condition on LWP migrations.
>
> curlwp_bind sets LP_BOUND to l_pflags of curlwp and that prevents
> curlwp from migrating to another CPU until curlwp_bindx is called.

The bug you found (and I trimmed) looks like the culprit, but there is
an extra problem which probably happens to not manifest itself in terms
of code generation: the bind/unbind inlines lack compiler barriers. See
KPREEMPT_* inlines for comparison. The diff is definitely trivial:

diff --git a/sys/sys/lwp.h b/sys/sys/lwp.h
index 47d162271f9c..f18b76b984e4 100644
--- a/sys/sys/lwp.h
+++ b/sys/sys/lwp.h
@@ -536,6 +536,7 @@ curlwp_bind(void)

        bound = curlwp->l_pflag & LP_BOUND;
        curlwp->l_pflag |= LP_BOUND;
+       __insn_barrier();

        return bound;
 }
@@ -545,6 +546,7 @@ curlwp_bindx(int bound)
 {

        KASSERT(curlwp->l_pflag & LP_BOUND);
+       __insn_barrier();
        curlwp->l_pflag ^= bound ^ LP_BOUND;
 }

-- 
Mateusz Guzik <mjguzik gmail.com>

Reply via email to