fstrans_start(vp->v_mount) and vgone

Taylor R Campbell Tue, 11 Apr 2023 03:14:10 -0700

Last year hannken@ changed fstrans(9) to use a hash table mapping
struct mount pointers to fstrans info, in order to fix a crash found
by syzkaller arising from a race between fstrans_start(vp->v_mount)
and vgone(vp):


https://syzkaller.appspot.com/bug?extid=54dc9ac0804a97b59bc6
https://mail-index.netbsd.org/source-changes/2022/07/08/msg139622.html

However, while this may avoid a _crash_ from use-after-free of
vp->v_mount->mnt_..., I'm not sure this is enough to prevent a _race_.

Suppose we have the following sequence of events:

lwp0                    lwp1
----                    ----
mp = vp->v_mount;
                        vgone(vp);
                        -> vp->v_mount = deadfs
                        unmount(mp)
                        -> free(mp)
                        mount a new file system
                        -> malloc() returns the same mp
fstrans_start(mp);

Now lwp0 has started a transaction on some totally unrelated mount
point -- and may take confused actions under the misapprehension that
vp is a vnode that unrelated mount point.

I think in order to fix this properly, we really need an operation
that atomically (a) loads vp->v_mount, and (b) starts an fstrans on
it.

It's not necessary to use a hash table indirection for this -- we can
just use pserialize and ensure that struct mount objects aren't
actually freed until all pserialize read sections are completed.

Something like this:

struct mount *
fstrans_start_vnode(struct vnode *vp)
{
        struct mount *mp;
        struct fstrans_lwp_info *fli;
        int s;

        s = pserialize_read_enter();
        mp = atomic_load_relaxed(&vp->v_mount);
        fli = fstrans_get_lwp_info(mp, /*alloc*/false);
        if (fli == NULL) {
                pserialize_read_exit(s);
                /* XXX preallocate fli and retry with alloc=true */
        }
        if (fli->fli_trans_cnt > 0) {
                fli->fli_trans_cnt++;
                return mp;
        }
        fmi = fli->fli_mountinfo;
        if (__predict_true(grant_lock(fmi, ...))) {
                ...
        }
        ...
}

And fstrans_mount_dtor can do pserialize_perform (or just xc_barrier)
to wait for all these pserialize read sections to complete before
freeing mp.

Then, any code that currently does

        mp = vp->v_mount;
        fstrans_start(mp);

or equivalent, can be rewritten to do:

        mp = fstrans_start_vnode(vp);

Perhaps we can even forbid direct access to vp->v_mount to catch any
mistakes of this class -- and if there's a lot of access to it that
doesn't match this pattern, we could add a vnode_mount(vp) accessor to
assert that the caller holds an fstrans.

Thoughts?

fstrans_start(vp->v_mount) and vgone

Reply via email to