2017-01-13 14:35 GMT+01:00 Nadav Har'El <n...@scylladb.com>:

> This patch fixes a deadlock caused by lock order inversion. This deadlock,
> described in issue #817, sometimes hangs the entire VFS when a mount is
> attempted.
>
> Most of the VFS code takes locks on various nodes, and later, in internal
> ZFS code, calls vfs_busy() which takes the mount_lock.
> Before this patch, the sys_mount() code these locks in the opposite order:
> mount_lock first, and while this is held, then called lookup code which
> locked various nodes.
>
> The fix is for sys_mount() to not hold mount_lock while looking up paths,
> and only hold it while actually reading or writing the mount_list object.
> Because we do this twice in sys_mount - in the beginning to check if the
> path is not already mounted, and at the end to add the newly prepared
> mount point to the list - the patch also adds a new mutex to serialize
> calls to sys_mount() so that we won't have a situation where two concurrent
> mounts both decide they can mount the same device or on the same path.
>
> Fixes #817.
>
> Signed-off-by: Nadav Har'El <n...@scylladb.com>
> ---
>  fs/vfs/vfs_mount.cc | 26 ++++++++++++++++++--------
>  1 file changed, 18 insertions(+), 8 deletions(-)
>
> diff --git a/fs/vfs/vfs_mount.cc b/fs/vfs/vfs_mount.cc
> index 9e241ad..5258ae3 100644
> --- a/fs/vfs/vfs_mount.cc
> +++ b/fs/vfs/vfs_mount.cc
> @@ -117,14 +117,22 @@ sys_mount(const char *dev, const char *dir, const
> char *fsname, int flags, const
>      if (dev && strncmp(dev, "/dev/", 5) == 0)
>          device_open(dev + 5, DO_RDWR, &device);
>
> -    SCOPE_LOCK(mount_lock);
> -
>      /* Check if device or directory has already been mounted. */
> -    for (auto&& mp : mount_list) {
> -        if (!strcmp(mp->m_path, dir) ||
> -            (device && mp->m_dev == device)) {
> -            error = EBUSY;  /* Already mounted */
> -            goto err1;
> +    // We need to avoid the situation where after we already verified that
> +    // the mount point is free, but before we actually add it to
> mount_list,
> +    // another concurrent mount adds it. So we use a new mutex to ensure
> +    // that only on sys_mount() runs at a time. We reuse the existing
>

s/on/one/g


> +    // mount_lock for this purpose: If we take mount_lock and then do
> +    // lookups, this is lock order inversion and can result in deadlock.
> +    static mutex sys_mount_lock;
> +    SCOPE_LOCK(sys_mount_lock);
> +    WITH_LOCK(mount_lock) {
> +        for (auto&& mp : mount_list) {
> +            if (!strcmp(mp->m_path, dir) ||
> +                (device && mp->m_dev == device)) {
> +                error = EBUSY;  /* Already mounted */
> +                goto err1;
> +            }
>          }
>      }
>      /*
> @@ -192,7 +200,9 @@ sys_mount(const char *dev, const char *dir, const char
> *fsname, int flags, const
>      /*
>       * Insert to mount list
>       */
> -    mount_list.push_back(mp);
> +    WITH_LOCK(mount_lock) {
> +        mount_list.push_back(mp);
> +    }
>
>      return 0;   /* success */
>   err4:
> --
> 2.9.3
>
> --
> You received this message because you are subscribed to the Google Groups
> "OSv Development" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to osv-dev+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to osv-dev+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to