* Liu Bo ([email protected]) wrote:
> On Thu, May 30, 2019 at 02:50:05PM -0400, Vivek Goyal wrote:
> > On Thu, May 30, 2019 at 11:45:11AM -0700, Liu Bo wrote:
> > > On Thu, May 30, 2019 at 02:11:41PM -0400, Vivek Goyal wrote:
> > > > Doing posix locks with-in guest kernel are not sufficient if a file/dir
> > > > is being shared by multiple guests. So we need the notion of daemon 
> > > > doing
> > > > the locks which are visible to rest of the guests.
> > > > 
> > > > Given posix locks are per process, one can not call posix lock API on 
> > > > host,
> > > > otherwise bunch of basic posix locks properties are broken. For example,
> > > > If two processes (A and B) in guest open the file and take locks on 
> > > > different
> > > > sections of file, if one of the processes closes the fd, it will close
> > > > fd on virtiofsd and all posix locks on file will go away. This means if
> > > > process A closes the fd, then locks of process B will go away too.
> > > >
> > > > Similar other problems exist too.
> > > > 
> > > > This patch set tries to emulate posix locks while using open file
> > > > description locks provided on Linux.
> > > > 
> > > > Daemon provides two options (-o posix_lock, -o no_posix_lock) to enable
> > > > or disable posix locking in daemon. By default it is enabled.
> > > > 
> > > > There are few issues though.
> > > > 
> > > > - GETLK() returns pid of process holding lock. As we are emulating locks
> > > >   using OFD, and these locks are not per process and don't return pid
> > > >   of process, so GETLK() in guest does not reuturn process pid.
> > > > 
> > > > - As of now only F_SETLK is supported and not F_SETLKW. We can't block
> > > >   the thread in virtiofsd for arbitrary long duration as there is only
> > > >   one thread serving the queue. That means unlock request will not make
> > > >   it to daemon and F_SETLKW will block infinitely and bring virtio-fs
> > > >   to a halt. This is a solvable problem though and will require 
> > > > significant
> > > >   changes in virtiofsd and kernel. Left as a TODO item for now.
> > > 
> > > We've also seen this hang with flock()'s sleep mode, I was wondering
> > > if we could pthread_create a new thread to do the sleeping locking.
> > 
> > One idea I was discussing with david gilbert is, can we have multiple
> > threds serving same virt queue and that will allow us blocking in same
> > context as the caller.
> > 
> > This probably also means create a separate virtqueue for sending down
> > requests which can block for arbitrarily long amount of time, to make
> > sure deadlock does not happen.
> >
> 
> Right, with "separate virtqueue" + "multi threading" together, the
> problem should be addressed.

But it's hard enough that I was thinking of a special for the flock that
looked closer to the pthread_create.

Dave

> thanks,
> -liubo
> 
> > Thanks
> > Vivek
> > 
> > > 
> > > thanks,
> > > -liubo
> > > > 
> > > > Signed-off-by: Vivek Goyal <[email protected]>
> > > > ---
> > > >  contrib/virtiofsd/passthrough_ll.c |  185 
> > > > ++++++++++++++++++++++++++++++++++++-
> > > >  1 file changed, 184 insertions(+), 1 deletion(-)
> > > > 
> > > > Index: qemu/contrib/virtiofsd/passthrough_ll.c
> > > > ===================================================================
> > > > --- qemu.orig/contrib/virtiofsd/passthrough_ll.c        2019-04-25 
> > > > 10:49:14.103386416 -0400
> > > > +++ qemu/contrib/virtiofsd/passthrough_ll.c     2019-05-30 
> > > > 14:02:55.598483536 -0400
> > > > @@ -58,6 +58,12 @@
> > > >  #include <gmodule.h>
> > > >  #include "seccomp.h"
> > > >  
> > > > +/* Keep track of inode posix locks for each owner. */
> > > > +struct lo_inode_plock {
> > > > +       uint64_t        lock_owner;
> > > > +       int     fd;     /* fd for OFD locks */
> > > > +};
> > > > +
> > > >  struct lo_map_elem {
> > > >         union {
> > > >                 struct lo_inode *inode;
> > > > @@ -86,6 +92,8 @@ struct lo_inode {
> > > >         struct lo_key key;
> > > >         uint64_t refcount; /* protected by lo->mutex */
> > > >         fuse_ino_t fuse_ino;
> > > > +       pthread_mutex_t mutex;
> > > > +       GHashTable *posix_locks; /* protected by lo_inode->mutex */
> > > >  };
> > > >  
> > > >  struct lo_cred {
> > > > @@ -105,6 +113,7 @@ struct lo_data {
> > > >         int norace;
> > > >         int writeback;
> > > >         int flock;
> > > > +       int posix_lock;
> > > >         int xattr;
> > > >         const char *source;
> > > >         double timeout;
> > > > @@ -133,6 +142,10 @@ static const struct fuse_opt lo_opts[] =
> > > >           offsetof(struct lo_data, flock), 1 },
> > > >         { "no_flock",
> > > >           offsetof(struct lo_data, flock), 0 },
> > > > +       { "posix_lock",
> > > > +         offsetof(struct lo_data, posix_lock), 0 },
> > > > +       { "no_posix_lock",
> > > > +         offsetof(struct lo_data, posix_lock), 0 },
> > > >         { "xattr",
> > > >           offsetof(struct lo_data, xattr), 1 },
> > > >         { "no_xattr",
> > > > @@ -362,13 +375,24 @@ static void lo_init(void *userdata,
> > > >                         fprintf(stderr, "lo_init: activating flock 
> > > > locks\n");
> > > >                 conn->want |= FUSE_CAP_FLOCK_LOCKS;
> > > >         }
> > > > +
> > > > +       if (conn->capable & FUSE_CAP_POSIX_LOCKS) {
> > > > +               if (lo->posix_lock) {
> > > > +                       if (lo->debug)
> > > > +                               fprintf(stderr, "lo_init: activating 
> > > > posix locks\n");
> > > > +                       conn->want |= FUSE_CAP_POSIX_LOCKS;
> > > > +               } else {
> > > > +                       if (lo->debug)
> > > > +                               fprintf(stderr, "lo_init: disabling 
> > > > posix locks\n");
> > > > +                       conn->want &= ~FUSE_CAP_POSIX_LOCKS;
> > > > +               }
> > > > +       }
> > > >         if ((lo->cache == CACHE_NONE && !lo->readdirplus_set) ||
> > > >             lo->readdirplus_clear) {
> > > >                 if (lo->debug)
> > > >                         fprintf(stderr, "lo_init: disabling 
> > > > readdirplus\n");
> > > >                 conn->want &= ~FUSE_CAP_READDIRPLUS;
> > > >         }
> > > > -
> > > >  }
> > > >  
> > > >  static void lo_getattr(fuse_req_t req, fuse_ino_t ino,
> > > > @@ -673,6 +697,8 @@ static int lo_do_lookup(fuse_req_t req,
> > > >                 newfd = -1;
> > > >                 inode->key.ino = e->attr.st_ino;
> > > >                 inode->key.dev = e->attr.st_dev;
> > > > +               pthread_mutex_init(&inode->mutex, NULL);
> > > > +               inode->posix_locks = g_hash_table_new(g_direct_hash, 
> > > > g_direct_equal);
> > > >  
> > > >                 pthread_mutex_lock(&lo->mutex);
> > > >                 inode->fuse_ino = lo_add_inode_mapping(req, inode);
> > > > @@ -1038,6 +1064,10 @@ static void unref_inode(struct lo_data *
> > > >         if (!inode->refcount) {
> > > >                 lo_map_remove(&lo->ino_map, inode->fuse_ino);
> > > >                  g_hash_table_remove(lo->inodes, &inode->key);
> > > > +               if (g_hash_table_size(inode->posix_locks)) {
> > > > +                       warn("Hash table is not empty\n");
> > > > +               }
> > > > +               g_hash_table_destroy(inode->posix_locks);
> > > >                 pthread_mutex_unlock(&lo->mutex);
> > > >                 close(inode->fd);
> > > >                 free(inode);
> > > > @@ -1379,6 +1409,131 @@ out:
> > > >                 fuse_reply_create(req, &e, fi);
> > > >  }
> > > >  
> > > > +/* Should be called with inode->mutex held */
> > > > +static struct lo_inode_plock *lookup_create_plock_ctx(struct lo_data 
> > > > *lo,
> > > > +                               struct lo_inode *inode, uint64_t 
> > > > lock_owner,
> > > > +                               pid_t pid, int *err)
> > > > +{
> > > > +       struct lo_inode_plock *plock;
> > > > +       char procname[64];
> > > > +       int fd;
> > > > +
> > > > +       plock = g_hash_table_lookup(inode->posix_locks,
> > > > +                                   GUINT_TO_POINTER(lock_owner));
> > > > +
> > > > +       if (plock)
> > > > +               return plock;
> > > > +
> > > > +       plock = malloc(sizeof(struct lo_inode_plock));
> > > > +       if (!plock) {
> > > > +               *err = ENOMEM;
> > > > +               return NULL;
> > > > +       }
> > > > +
> > > > +       /* Open another instance of file which can be used for ofd 
> > > > locks. */
> > > > +       sprintf(procname, "%i", inode->fd);
> > > > +
> > > > +       /* TODO: What if file is not writable? */
> > > > +       fd = openat(lo->proc_self_fd, procname, O_RDWR);
> > > > +       if (fd == -1) {
> > > > +               *err = -errno;
> > > > +               free(plock);
> > > > +               return NULL;
> > > > +       }
> > > > +
> > > > +       plock->lock_owner = lock_owner;
> > > > +       plock->fd = fd;
> > > > +       g_hash_table_insert(inode->posix_locks,
> > > > +                           GUINT_TO_POINTER(plock->lock_owner), plock);
> > > > +       return plock;
> > > > +}
> > > > +
> > > > +static void lo_getlk(fuse_req_t req, fuse_ino_t ino,
> > > > +                    struct fuse_file_info *fi, struct flock *lock)
> > > > +{
> > > > +       struct lo_data *lo = lo_data(req);
> > > > +       struct lo_inode *inode;
> > > > +       struct lo_inode_plock *plock;
> > > > +       int ret, saverr = 0;
> > > > +
> > > > +       if (lo_debug(req))
> > > > +               fprintf(stderr, "lo_getlk(ino=%" PRIu64 ", flags=%d)"
> > > > +                              " owner=0x%lx, l_type=%d l_start=0x%lx"
> > > > +                             " l_len=0x%lx\n", ino, fi->flags, 
> > > > fi->lock_owner,
> > > > +                             lock->l_type, lock->l_start, lock->l_len);
> > > > +
> > > > +       inode = lo_inode(req, ino);
> > > > +       if (!inode) {
> > > > +               fuse_reply_err(req, EBADF);
> > > > +               return;
> > > > +       }
> > > > +
> > > > +       pthread_mutex_lock(&inode->mutex);
> > > > +       plock = lookup_create_plock_ctx(lo, inode, fi->lock_owner, 
> > > > lock->l_pid,
> > > > +                       &ret);
> > > > +       if (!plock) {
> > > > +               pthread_mutex_unlock(&inode->mutex);
> > > > +               fuse_reply_err(req, ret);
> > > > +               return;
> > > > +       }
> > > > +
> > > > +       ret = fcntl(plock->fd, F_OFD_GETLK, lock);
> > > > +       if (ret == -1)
> > > > +               saverr = errno;
> > > > +       pthread_mutex_unlock(&inode->mutex);
> > > > +
> > > > +       if (saverr)
> > > > +               fuse_reply_err(req, saverr);
> > > > +       else
> > > > +               fuse_reply_lock(req, lock);
> > > > +}
> > > > +
> > > > +static void lo_setlk(fuse_req_t req, fuse_ino_t ino,
> > > > +                    struct fuse_file_info *fi, struct flock *lock, int 
> > > > sleep)
> > > > +{
> > > > +       struct lo_data *lo = lo_data(req);
> > > > +       struct lo_inode *inode;
> > > > +       struct lo_inode_plock *plock;
> > > > +       int ret, saverr = 0;
> > > > +
> > > > +       if (lo_debug(req))
> > > > +               fprintf(stderr, "lo_setlk(ino=%" PRIu64 ", flags=%d)"
> > > > +                       " cmd=%d pid=%d owner=0x%lx sleep=%d 
> > > > l_whence=%d"
> > > > +                       " l_start=0x%lx l_len=0x%lx\n", ino, fi->flags,
> > > > +                       lock->l_type, lock->l_pid, fi->lock_owner, 
> > > > sleep,
> > > > +                       lock->l_whence, lock->l_start, lock->l_len);
> > > > +
> > > > +       if (sleep) {
> > > > +               fuse_reply_err(req, EOPNOTSUPP);
> > > > +               return;
> > > > +       }
> > > > +
> > > > +       inode = lo_inode(req, ino);
> > > > +       if (!inode) {
> > > > +               fuse_reply_err(req, EBADF);
> > > > +               return;
> > > > +       }
> > > > +
> > > > +       pthread_mutex_lock(&inode->mutex);
> > > > +       plock = lookup_create_plock_ctx(lo, inode, fi->lock_owner, 
> > > > lock->l_pid,
> > > > +                       &ret);
> > > > +
> > > > +       if (!plock) {
> > > > +               pthread_mutex_unlock(&inode->mutex);
> > > > +               fuse_reply_err(req, ret);
> > > > +               return;
> > > > +       }
> > > > +
> > > > +       /* TODO: Is it alright to modify flock? */
> > > > +       lock->l_pid = 0;
> > > > +       ret = fcntl(plock->fd, F_OFD_SETLK, lock);
> > > > +       if (ret == -1) {
> > > > +               saverr = errno;
> > > > +       }
> > > > +       pthread_mutex_unlock(&inode->mutex);
> > > > +       fuse_reply_err(req, saverr);
> > > > +}
> > > > +
> > > >  static void lo_fsyncdir(fuse_req_t req, fuse_ino_t ino, int datasync,
> > > >                         struct fuse_file_info *fi)
> > > >  {
> > > > @@ -1476,6 +1631,31 @@ static void lo_flush(fuse_req_t req, fus
> > > >  {
> > > >         int res;
> > > >         (void) ino;
> > > > +       struct lo_inode *inode;
> > > > +       struct lo_inode_plock *plock;
> > > > +
> > > > +       inode = lo_inode(req, ino);
> > > > +       if (!inode) {
> > > > +               fuse_reply_err(req, EBADF);
> > > > +               return;
> > > > +       }
> > > > +
> > > > +       /* An fd is going away. Cleanup associated posix locks */
> > > > +       pthread_mutex_lock(&inode->mutex);
> > > > +       plock = g_hash_table_lookup(inode->posix_locks,
> > > > +                                   GUINT_TO_POINTER(fi->lock_owner));
> > > > +       if (plock) {
> > > > +               g_hash_table_remove(inode->posix_locks,
> > > > +                                   GUINT_TO_POINTER(fi->lock_owner));
> > > > +               /*
> > > > +                * We had used open() for locks and had only one fd. So
> > > > +                * closing this fd should release all OFD locks.
> > > > +                */
> > > > +               close(plock->fd);
> > > > +               free(plock);
> > > > +       }
> > > > +       pthread_mutex_unlock(&inode->mutex);
> > > > +
> > > >         res = close(dup(lo_fi_fd(req, fi)));
> > > >         fuse_reply_err(req, res == -1 ? errno : 0);
> > > >  }
> > > > @@ -1963,6 +2143,8 @@ static struct fuse_lowlevel_ops lo_oper
> > > >         .releasedir     = lo_releasedir,
> > > >         .fsyncdir       = lo_fsyncdir,
> > > >         .create         = lo_create,
> > > > +       .getlk          = lo_getlk,
> > > > +       .setlk          = lo_setlk,
> > > >         .open           = lo_open,
> > > >         .release        = lo_release,
> > > >         .flush          = lo_flush,
> > > > @@ -2189,6 +2371,7 @@ int main(int argc, char *argv[])
> > > >         struct fuse_cmdline_opts opts;
> > > >         struct lo_data lo = { .debug = 0,
> > > >                               .writeback = 0,
> > > > +                             .posix_lock = 1,
> > > >                               .proc_self_fd = -1,
> > > >         };
> > > >         struct lo_map_elem *root_elem;
> > > > 
> > > > _______________________________________________
> > > > Virtio-fs mailing list
> > > > [email protected]
> > > > https://www.redhat.com/mailman/listinfo/virtio-fs
--
Dr. David Alan Gilbert / [email protected] / Manchester, UK

Reply via email to